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Abstract 

This study expands on previous National Center for Research on Evaluation, Standards, 
and Student Testing (CRESST) work that has undertaken the articulation of the academic 
language construct for broad educational purposes. The primary goal was to describe the 
language of textbook selections in terms of vocabulary, grammar, and organization of 
discourse for test development. Specifically, the work reported here has focused on the 
academic English used in fifth-grade mathematics, science, and social studies textbooks 
and will contribute to the conceptual discussion of the nature of academic language, and 
will provide concrete guidelines for test development. The vocabulary analyses included 
measures of lexical diversity, word frequency, and frequency of multisyllabic and 
derived words, as well as variety of clause connectors and frequency of nominalizations. 
In addition, academic vocabulary was identified as specialized (within a discipline) and 
general (across disciplines). The grammatical analyses included the following features: 
sentence type, clause type, passive verb forms, prepositional phrases, noun phrases, and 
participial modifiers. The discourse analyses captured the organizational features of the 
selections on three levels: rhetorical mode (e.g., exposition and persuasion) and both 
dominant and supporting text features (e.g., description, classification, and paraphrase). The 
analyses of textbook language reported here have provided the bases for a profile of 
typical texts in each subject at the fifth grade that will be part of the foundation for 
developing academic language proficiency tests. We conclude by illustrating how these 
profiles can be used in the creation of test specifications. 
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CHAPTER 1: 



INTRODUCTION 

Under the No Child Left Behind (NCLB) legislation of 2001, educators face the 
challenge of assuring that all students ultimately meet rigorous academic standards 
and demonstrate yearly progress towards that goal. For English learners, the task is 
especially challenging because they are faced with acquiring English language skills 
at the same time they are expected to learn academic content in English in a range of 
subjects. The role of language in the different subject areas impacts English learners' 
ability to learn across disciplines (e.g., Butler & Castellon-Wellington, 2000). Thus, in 
order to provide adequate support and assess progress in language and other 
subject matter, it is important for educators to understand the specific role of English 
within academic subject areas such as mathematics, science, and so forth. Research 
that focuses on English language development (ELD) in and for academic contexts 
can play a critical role in helping to address the challenges faced by educators and 
students (e.g., August & Hakuta, 1997; Gottlieb, 2003; MacSwan & Rolstad, 2003). 

This study expands on previous work involving English learners conducted at 
CRESST by further specifying the construct of academic language and refining a 
research procedure that can be extended to additional grade levels and subject areas. 
Specifically, we analyzed textbook language in three subject areas — mathematics, 
science, and social studies — at the fifth grade. The analyses yielded descriptions of 
vocabulary, grammar, and organization of discourse. The immediate goal is to use 
the information in this report to generate test specifications for academic language 
tasks for assessing the English language proficiency of English learners. The 
information from this research effort, however, could have broader applications by 
providing empirical evidence about the nature of the language all students must 
understand and use in and across subjects. Test developers, teacher trainers, 
curriculum developers in subject areas, and textbook writers will all benefit from 
this information. 

The report is organized as follows: First, in this chapter, previous CRESST 
research on academic language is summarized; next, our current research focus and 
goals are discussed, followed by an overview of the methodology used in the study. 
Chapters 2 through 5 present the findings, including descriptive statistics about the 
text selections in Chapter 2, a discussion of the lexical features identified in texts in 
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Chapter 3, grammatical features in Chapter 4, and discourse (organizational) 
features in Chapter 5. Chapter 6 discusses the findings and demonstrates their 
applicability to CRESST test development work. Chapter 7 provides conclusions and 
recommendations. 

Previous CRESST Research on Academic Language 

The work reported here builds on multiple CRESST studies, including research 
on subgroups of English learners, academic language proficiency, the validity and 
use of standardized English-language content assessments with English learners, 
and operationalization of the academic language construct. 1 The initial work began 
with a focus on the use of test accommodations with English learners on 
standardized content assessments and the articulation of subgroups of English 
learners on the basis of multiple variables (Butler & Stevens, 1997). 2 Language 
proficiency was identified as one of the most important variables "in characterizing 
English language learners and... [as] essential for identifying the interface between 
language and content knowledge in standards-based assessments" (p. 24). In the 
conclusion, the authors call for the development of measures of academic language 
proficiency based on the documentation of classroom language. 

Subsequent research was undertaken to characterize the features of academic 
language at the middle school level. Butler, Stevens, and Castellon-Wellington 
(1999) created content-based academic language tasks for seventh-grade classrooms 
using social science texts as the content base. This research documents initial steps 
towards operationalizing and assessing academic language, specifically for the 
purpose of "establishing a threshold level of academic language proficiency in 
English through the use of a series of cross-modality assessments that build on 
language commonalties in different content areas" (p. 2). In other words, the goal of 
the work was to develop specifications for assessing academic language proficiency 
through the use of different types of authentic reading and writing tasks that reflect 
language use common to multiple subjects — mathematics, science, social studies, 
and language arts. Establishing a threshold level of academic language proficiency 

1 Operationalize in this context means providing descriptions of academic language based on 
empirical data that include enough specificity to allow for the development of specifications and 
tasks /items for assessing academic language proficiency. 

2 The word content used in these earlier studies refers to school subjects such as mathematics, science, 
social studies, etc. In this report, we use the term subject instead to refer to mathematics, science, and 
social studies to avoid confusion with linguistic use of the term "content word" used in Chapter 3. 
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involves determining at what point student performance in the classroom and on 
standardized content assessments reflects subject-area knowledge and not the lack 
of English language skills. 

CRESST researchers concurrently investigated performance differences 
between native English speakers and English learners on standardized content 
assessments (Abedi, Courtney, & Leon, 2001; Butler & Castellon- Wellington, 2000), 
the language demands of standardized content assessments in English (Bailey, 
2000), and the relationship between the language assessed on a language proficiency 
reading subtest and the language used on standardized content assessments 
(Stevens, Butler, & Castellon-Wellington, 2000). Butler and Castellon-Wellington 
(2000) reported the differential performance of English learners on content 
assessments based on language proficiency classification (e.g., limited English 
proficiency). The findings indicate issues of test validity for this group of learners, 
and again calling for measures of academic language proficiency, a language 
register that is relevant to the language of standardized assessments. Analysis of the 
language demands of standardized mathematics, science and English language arts 
(ELA) (i.e., reading comprehension) assessments revealed that all three content areas 
present challenging syntax and vocabulary to greater or lesser degrees in the 
majority of test items (Bailey, 2000). Elowever, mathematics and science items 
differed from ELA in the amount of connected discourse students are required to 
process. 

In Stevens et al. (2000), the language assessed in an English proficiency reading 
subtest for seventh-grade English learners was found to differ substantially from the 
type of language used in a social studies assessment, for example, in terms of having 
less specialized content vocabulary, less complex syntax typically associated with 
subject-area materials, and a smaller variety of item-response formats and prompts. 
Additionally, despite the fact that all the students in this study were grouped as 
"limited English proficient" by the subtest, the subject-matter assessment results 
showed differential levels of performance within the group, indicating that the 
English assessment may not have adequately separated groups of students at the 
upper levels of the limited-English-proficiency range, and that, as indicated by the 
students, factors other than language, such as opportunity to learn, contributed to 
performance. 

More recent CRESST work has focused on operationalizing the academic 
language construct and documenting the features of academic English that are 
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discipline-specific and those that cut across subject areas for the purposes of 
developing language assessment specifications and tasks across grade clusters and 
subject areas. Based on a series of observations of fourth-, fifth-, and sixth-grade 
mainstream science classrooms, Bailey, Butler, LaFramenta, and Ong (2004) 
developed matrices to illustrate the intersection between contexts of instruction 
(science concepts, vocabulary, and application instruction) and language functions, 
repair strategies, and classroom management talk. There was great variability in the 
degree to which teachers held students accountable for verbalization of their 
knowledge, with only some teachers, for example, requiring students to provide 
fully elaborated explanations for their scientific claims. This source of variation in 
teacher discourse style has implications for student learning and assessment and is 
documented for other subject areas (e.g., mathematical discourse) as well (e.g., 
Kazemi, 1999). Overt instruction of subject-area vocabulary (i.e., specialized 
academic vocabulary) was frequently made, however, primarily relying on 
examples rather than on formal definitions, and occurring more often than overt 
instruction of general academic vocabulary (e.g., vocabulary words that are used 
across subject areas in academic contexts). 

In response to the call for providing stronger evidentiary bases to educational 
research (Feuer, Towne, & Shavelson, 2002; National Research Council, 2002), Bailey 
and Butler (2003) argued for and presented an evidence-based research framework 
for operationalizing academic language. The framework lays out six specific types of 
evidence to serve as bases for describing academic language. Relevant empirical 
data has been culled from the CRESST research mentioned above, including data on 
fourth- through eighth-grade textbooks and materials, third-, seventh-, eighth- and 
eleventh-grade content assessments, and fourth- through sixth-grade classroom 
discourse. Currently, research is being conducted to fill in gaps where no empirical 
research exists. 

In Butler, Lord, Stevens, Borrego, and Bailey (2004) the evidence-based 
framework is applied at one grade level to supplement the classroom discourse 
research mentioned above. Butler et al. (2004) focused on science and mathematics 
standards and textbooks at the fifth-grade level. Analysis of the textbooks helped to 
characterize: (a) the nature of the language students are expected to read and 
understand and (b) the language demands implicit in the tasks they are called upon 
to perform. Features investigated include language functions that were exemplified 
in the texts and implicitly required in the tasks, grammatical features including 
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sentence length and number of embedded clauses, and lexical features including the 
identification of general and content-specific academic vocabulary. Marked 
differences were observed between the types of texts that appear most frequently in 
science textbooks compared to those in mathematics textbooks, providing empirical 
evidence of the existence of discipline-specific language. Additionally, the analyses 
of functional language show that while texts in science and mathematics vary 
considerably, the language functions students must use in the two subject areas and 
the language structures associated with those functions are similar, providing 
evidence of general academic language beyond the vocabulary level as well. 

The CRESST studies have played a crucial role in shaping our conceptual 
understanding and in refining our research design. Specifically, we have been able 
to capitalize on the methodologies that were developed for the characterization of 
textbook language (Butler et al., 2004). Other research efforts (e.g., Cazden, 2001; 
Chamot & O'Malley, 1994; Reppen, 2001; Scarcella, 2003; Schleppegrell, 2001; Short, 
1993) provided guidance in the choice of the different analyses and are referred to in 
the general methodology discussion and other relevant chapters below. In the next 
section we present our research focus and goals for the current study. 

Current Research Focus and Goals 

As a continuation of our effort to operationalize academic language, the study 
reported here provides descriptions of the language used in selected, representative 
fifth-grade mathematics, science, and social studies textbooks. These data expand on 
previous, exploratory analyses of fifth-grade mathematics and science textbooks, 
which provided preliminary empirical data on the language being used in the two 
subjects (Butler et al., 2004). Our work is focused at the fifth grade because students 
at this grade are expected to use reading to aid in their learning and not concentrate 
on the development of reading skills per se (Reppen, 2001). 

The modality of reading was selected for this stage of research; test 
development efforts with other modalities will follow. We began with reading 
because it is a focal point of the NCTB legislation for all students and presents an 
especially critical burden for English learners, who must also be tested annually. All 
students are expected to read at grade level and must have the ability to do so in 
order to access and demonstrate knowledge on standardized assessments. In 
addition, reading provides increasingly complex linguistic input to students from 
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grade to grade since the reliance on reading comprehension in instruction and 
assessment across subject matter increases with grade level (RAND, 2002). 

The information from the current study will be used for test specification 
development for English language tasks appropriate for fifth grade. Specifically, 
descriptions of the lexical, grammatical, and organizational features of the textbook 
selections will allow us to create a profile of typical texts in each subject. The profiles 
will be part of the foundation for developing specifications and academic language 
assessment tasks and items. Characterization of texts across a range of subject areas 
will provide information on content-specific language demands, while 
characterization of texts by grade level or cluster will provide data on 
developmental demands. 

To provide the necessary empirical foundation for test development, two 
research questions guided our current efforts. Those questions are: 

1. What are the linguistic characteristics of mathematics word problems and multi- 
paragraph texts in science and social studies at the fifth-grade level? 

1. How do the identified characteristics of texts in different subject areas compare 
to one another? 

Answers to the two questions provide the empirical evidence needed for 
generating specifications for academic language reading tasks by allowing us to 
systematically describe the nature of language use in each type of text analyzed. The 
range and type of vocabulary along with the grammatical structures used to 
accomplish text purpose is synthesized into an academic language framework that 
will facilitate not only test development but eventually curriculum and materials 
development as well as professional development. We turn now to a discussion of 
the general methodology used in this study. 

General Methods 

This section introduces the methodology used in this study. First we present 
our research approach in association with the two research questions. Then we 
present the procedures followed in the study, including text selection, coding 
development, analysis, and determining accuracy and reliability of coding. 

Research Approach. To address the research questions, a range of analyses 
were applied systematically to the language in the mathematics, science, and social 
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studies textbook selections. Each research question is presented below along with an 
explanation of the types of analyses used to help answer the question. 

Question 1. What are the linguistic characteristics of mathematics word problems and 

multi-paragraph texts in science and social studies at the fifth-grade level? 

One of the first steps in the development of reading tasks is the selection of 
texts on which to base those tasks. Guidelines that provide characteristics of texts at 
the appropriate level(s) are needed for the selection process. The guidelines then 
become part of the test specifications. To present the text characteristics for the 
current effort, we produced text profiles for the subject areas that include key 
components of the text types to be used. 

To characterize the mathematics word problems and the science and social 
studies extended texts, we began by generating descriptive data for each selection in 
the study. Then, to more fully describe the texts, we looked specifically at the areas 
of vocabulary, grammar, and discourse mentioned above. The discussion here 
provides an overview of the types of analyses conducted; the analyses are fully 
elaborated in their respective chapters below. 

Descriptive Data. Measures of sentence length provide basic descriptive data 
and have long been associated with reading difficulty (Zakaluk & Samuels, 1988). 
Longer sentences tend to pose greater challenges for students. These challenges are 
found to be even greater for students whose home language is not English (Abedi, 
Lord & Plummer, 1997). For this reason sentence length is an important factor for 
creating the subject area profiles. Other basic descriptive data include such 
information as number of words, sentences, and paragraphs in the selections, mean 
sentence length calculated by topic and subject area (see the discussion on text 
selection later in this chapter for specific information on the number of selections per 
topic and the number of topics per subject area), and mean number of sentences per 
paragraph. The summary information is included in each subject-area profile. These 
descriptive data, presented in Chapter 2, provide indicators of similarities and 
differences in the nature of texts used in subject areas at the fifth grade. 

Lexical Data. For vocabulary, features believed consequential in defining 
academic vocabulary and for describing the acquisition and use of vocabulary in 
academic settings were identified in all text selections in mathematics, science, and 
social studies. First, lexical diversity, a measure that reflects the variety of 
vocabulary in a given text, was calculated by dividing the number of different words 
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(word types) by the total number of words (tokens) in that text. This lexical diversity 
ratio is important in characterizing selections because it helps establish the range of 
vocabulary students must be able to understand. Vocabulary that can be identified 
as part of the academic English lexicon by its usage in academic contexts includes 
both the specialized word usage within academic disciplines, as well as the general 
academic vocabulary not exclusive to any one discipline. 

To help us specify and define what constitutes an academic word beyond the 
contextual aspects, we also investigated discrete features of the vocabulary of the 
text selections that included: (a) vocabulary that appears on low frequency word 
lists for the fifth-grade level; (b) vocabulary that contains three or more syllables; 
and (c) vocabulary that is morphologically derived from root lexical forms. 

Word frequency and the frequency of multisyllabic words and derived words 
serve as indices of difficulty and will be important in the future for comparative 
purposes with texts from other grades. The vocabulary identified for these three 
features was compared to the vocabulary identified as academic vocabulary to 
determine degree of overlap (see Chapter 3 for a discussion of the identification 
process for academic vocabulary). Were the overlap significant with any one (or 
more) of the features, it might be possible to use that feature(s) to identify academic 
vocabulary in future research and test development efforts to streamline the 
identification process. 

Analysis of two additional lexical-level features was also included: frequency 
and variety of clause connectors and frequency of nominalizations. Specifically, the 
identification of adverbial clause connectors (often called adverbial subordinators) 
was an important part of the lexical analyses because they frequently signal meaning 
relationships between clauses. Since research suggests that students may not 
interpret these words correctly, resulting in possible misunderstanding of the 
meaning relationships they encode (see for discussion, Celce-Murcia & Larsen- 
Freeman, 1983; Halliday & Hasan, 1976), the frequency of their occurrence should be 
reflected in the text profiles and taken into consideration in the test development 
stage. Nominalizations were included because they can increase semantic 
complexity (Martin, 1991) and provide additional descriptive information about 
lexical usage in academic texts. Discussion of the lexical analyses is provided in 
Chapter 3. 
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Grammatical Data. For grammar, we looked at the following features: 
percentages of sentence types used (i.e., simple, compound, complex, and 
compound-complex), numbers of dependent and coordinate clauses (including 
percentages of total clauses), occurrences of passive voice verb forms, prepositional 
phrases, noun phrases, and participial modifiers. Since clause types have been found 
to be a factor in student test performance with higher frequency of dependent 
clauses associated with greater processing difficulty, range and frequency of 
sentence types, which reflect clause usage, are important considerations in text 
selection for assessment purposes (see Lord, 2002). Passive voice verb forms, 
prepositional phrases, noun phrases, and participial modifiers contribute to length 
and/or semantic complexity in texts, often increasing the processing load for the 
reader. They are also identified with academic prose (see Schleppegrell, 2001, 2004, 
for a review), and are therefore important features to note in characterizing fifth- 
grade texts. Discussion of the grammatical analyses is provided in Chapter 4. 

Organization of Discourse. For discourse, we looked at the organizational 
features of the selections in the study — the language functions and devices writers 
used to express ideas and present factual information. In addition to language 
functions, which are a key component of the language that students must use to 
interpret and derive meaning from texts (Butler et al., 2004), writers often use 
different types of writing devices to provide additional detail, to exemplify a point, 
or to ensure reader comprehension, while they use other techniques to provide 
instructional guidance to students (e.g., by referring them to graphics or prior 
lessons). Therefore, in order to capture a broad spectrum of the features that exist in 
the textbooks, we analyzed both functions and additional features used by writers to 
convey ideas. The discussion of organizational features (Chapter 5) shows how our 
analyses concentrated on three levels of textual organization: rhetorical mode, 
dominant text features, and supporting text features. 

The data characterizing the linguistic features of fifth-grade mathematics, 
science, and social studies textbooks provide a basis for the text profiles and for the 
content of academic language proficiency assessment tasks and items at this grade 
level. These data, however, required further analysis before they could be used in 
test specifications for an assessment of general academic language proficiency. The 
second research question guided the subsequent analyses. 

Question 2. How do the identified characteristics of texts in different subject areas 

compare to one another? 
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The answer to the second research question required synthesis of the 
descriptive data and the linguistic analyses (Chapters 2-5). The first step involved 
synthesis of the descriptive, lexical, grammatical, and discourse data into "profiles" 
of language for each subject area (Chapter 6). The subject-area profiles were then 
compared, allowing commonalities and differences to be systematically compiled. 
Using analysis of variance (ANOVA) procedures and confidence interval 
calculations we were able to determine which of the differences across the three 
subjects in terms of key descriptive, lexical, and grammatical features are statistically 
significant. The points of commonality will later become candidates for inclusion in 
general academic language proficiency assessment tasks and items. Points of 
difference are possible candidates for tasks and items that are focused on the 
language of specific subject areas. The discussion turns now to our research 
procedures. 

Procedures 

This section provides a discussion of the procedures followed in the 
identification and analysis of the textbook selections. The discussion includes (a) text 
selection rationale and procedure, (b) analysis procedures, and (c) accuracy and 
reliability procedures. 

Text Selection Rationale and Procedure. An important first step was to 
determine which texts would be selected for analysis in the three subject areas — 
mathematics, science, and social studies. We needed to determine which types of 
texts students encounter most frequently within and across subject areas; thus we 
needed texts that are both representative of typical texts and of sufficient length to 
provide material for test development. The word problem was established as the 
unit of analysis for mathematics for two main reasons: (a) Word problems provide a 
greater range of language use than other text types in mathematics textbooks, and 
(b) characterizing the language demands of word problems in textbooks will also 
best inform assessment development efforts, that is, the language of word problems 
mirrors the language that students commonly encounter on mathematics 
assessments that rely on word problems in addition to straight computation. 

Restricting our focus on word problems may of course bias the results we 
obtain for mathematics and may not therefore be representative of print materials 
prepared for use in other context such as during mathematics instruction. We 
identified two types of word problems in the textbooks: those that include and 
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require use of graphics to solve the problem and those that do not. For our research, 
we chose word problems without graphics and a minimum of two sentences, at least 
one of which must be a declarative statement that provides the set up for solving the 
mathematics problem. 

Based on previous research (Butler et al., 2004), multi-paragraph texts were 
identified as the most frequently occurring type of text in science and social studies 
textbooks and were, therefore, chosen for analysis in this research (see Appendix A 
for examples of selected texts from mathematics, science, and social studies). 

The text selection process began with identification of topics in the California 
subject-matter standards for fifth grade in each subject area. Three textbooks from 
different publishers were selected from the list of California-approved textbooks for 
each subject area as the sources from which to select the samples. Three were chosen 
in order to reduce bias (e.g., different textbooks might have different writing styles) 
that could result from analyzing textbook samples from only one publisher. We then 
compared the topics in the subject-matter standards with the textbook topics to 
identify the closest matches possible in content (e.g.. Matter, Storms). Since textbooks 
varied in their treatment of topics, we further narrowed our focus to topics that 
occurred in all three textbooks with similar subtopics, vocabulary, and concepts in 
order to insure adequate and comparable topic coverage. For instance, selections in 
each science textbook on the topic Storms included subsections on hurricanes and 
tornadoes; the similarity in subsections supported the topic Storms as a candidate for 
inclusion in the research. For each of the three subjects we chose four topics; we then 
chose three selections for each topic for a total of 12 text selections per subject. In 
total, 36 selections were identified for analysis (see Table 1 for the topics selected 
across subjects). 
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Table 1 



Topics Selected for Each Subject (n=3 selections per topic) 



Mathematics 


Science 


Social Studies 


Decimals 


Matter 


Declaration of 
independence 


Fractions 


Plants 


Industrial 

revolution 


Multiplication 


Storms 


Pilgrims 


Ratio 


Water cycle 


Slavery 



Note. Total number of selections = 36. 



The fifth-grade textbooks we utilized in this study averaged 567 pages each. To 
give a concrete sense of the amount of data that we analyzed in each textbook, we 
calculated the total number of textbook pages selected for analysis, and divided the 
number by the total number of pages in the textbook. On average, the volume of 
data that we have analyzed constitutes around 3% of each of the nine textbooks. In 
absolute terms, this totals 154 pages of text analyzed. In typical fifth-grade textbook 
layout we found this includes an average of one illustration per page. There is little 
variation in the average proportion of text pages selected across subject areas, 
ranging from 2.60% of the text in social studies textbooks and 3.03% of the text in 
science, to 3.43% in mathematics. 

In mathematics, we selected a number of word problems for each topic, 
carefully balancing the number of word problems and the total number of words 
selected from each textbook. In science and social studies, one multi-paragraph 
passage per topic was selected per textbook. The passages all begin at a natural 
starting point for the topic, often signaled by a header in the textbook, and end at a 
natural breakpoint. We attempted to select texts that were of sufficient and similar 
length for linguistic analyses across the three subjects, though there is some 
variation in length due in part to differences in presentation across textbooks and 
also due in part to subject matter differences (e.g., the depth of topic coverage or the 
variety of subtopics presented within a topic). The social studies selections, for 
example, tend to be longer than the science selections. We felt it was more important 
to ensure a coherent piece of text by including the beginning and end of each 
selection rather than stopping abruptly in the middle of a selection to achieve 
uniform word length across subject areas and textbooks. Once all the selections were 
identified for a subject area, they were entered into electronic format for data 
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analysis. To standardize comparisons across subject areas, we converted raw data 
into percentage, ratio or rate data (i.e., number of instances of a feature per 100 
sentences). 

Visuals, graphics, and primary source excerpts were not included at this stage 
of the study with the exception of some social studies selections that included one- 
or two-sentence excerpts or quotes from primary sources that were an integral part 
of the selection. Future analyses should be conducted to examine the role of primary 
sources and visual information used in conjunction with linguistic input. 

Analysis Procedures. A total of seven researchers were trained for the different 
analyses with pairs of researchers focusing on descriptive, lexical, grammatical, and 
discourse features respectively. Guidelines were developed specifically for each type 
of analysis by the researchers responsible for the particular area of investigation. 3 
The guidelines evolved as the analytic process for each type of analysis was refined. 
Two theoretical linguists carried out the grammatical analyses, and researchers with 
applied linguistics training focused on the lexical and discourse pieces. The analyses 
are discussed in detail in their respective chapters below. 

Accuracy and Reliability Procedures. To help ensure the strength of our 
results, accuracy or reliability checks, as appropriate, were conducted at each stage 
of the research for all the analyses performed on the textbook selections. The general 
procedures followed are reported here for each type of analysis. The results are 
reported in the respective chapters below. 

Accuracy checks were conducted on the analyses that were non-subjective (e.g., 
word counts), while reliability was conducted for those analyses that required the 
judgment of raters based on specific criteria. We established a goal of 95% or higher 
for the accuracy checks and 85% or higher interrater reliability. 

We conducted accuracy checks for the descriptive analyses, including number 
of words, sentences, and paragraphs per selection, as well as mean number of 
sentences per paragraph and mean sentence length by topic and subject area. In 
addition, accuracy checks were conducted for counts of multisyllabic words of three 
or more syllables, calculations of lexical diversity, and counts of words appearing on 
low frequency word lists. Initial counts were established by the automated 
Computer Language ANalysis (CLAN) programs of the Child Language Date 

3 The Academic English Language Proficiency (AELP) Guidelines for the Linguistic Analysis of Texts 
are being prepared for general distribution by CRESST. 
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Exchange System (CHILDES), (MacWhinney, 1995; MacWhinney & Snow, 1990); 
project researchers then verified the counts manually. 

We calculated reliability between coders for identification of subcategories of 
academic vocabulary. We also calculated reliability for identification of derived 
words, noun phrases, prepositional phrases, dependent clauses, participial 
modifiers, passive voice, and nominalizations. For our analyses of organizational 
features, we calculated interrater reliability for identification of rhetorical mode, 
dominant text features, and supporting text features. 

We turn now to Chapter 2 for a discussion of the descriptive analyses. 
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CHAPTER 2: 



DESCRIPTIVE ANALYSES 

One of the first steps in this research was to conduct descriptive analyses of the 
text selections, which, along with the other analyses, aid in the construction of 
profiles of text characteristics that typify each subject area. The profiles can be used 
to guide text selection and item development efforts. Additionally, these analyses 
helped us to identify the degree of uniformity and variation in language use across 
subject-area topics. We first describe the procedures for analyzing the text selections 
and then present the results. 



Procedures 

First, all of the text selections were typed into electronic files and checked for 
typographical accuracy. Next we ran computer-assisted language analyses using 
CLAN to determine the number of words and sentences in each selection. The mean 
number of sentences and mean sentence length, as well as measures of central 
tendency and dispersion for each selection were calculated using procedures of the 
statistical package SPSS. One researcher performed all the analyses, and a second 
researcher conducted accuracy checks on a third of the selections, establishing an 
accuracy rate of 99%. 



Findings 

The results of the analyses are presented below by subject area, beginning with 
mathematics. A summary at the end compares the results across subjects. 

Mathematics 

Table 2 provides the descriptive data for mathematics overall and by topic. 4 
The mathematics selections have a total of 212 word problems with 7,008 words total 
and a mean number of 33 words per word problem. The subject area mean number 
of sentences per word problem is 3, with a range of 2-7 sentences. It is important to 
note that all word problems selected for analysis deliberately consisted of two or 



4 In the results chapters (2-5), the tables provide data carried out to two decimal points, while data in 
prose is rounded to the nearest whole number. 



17 




more sentences, even though word problems consisting of only one sentence do 
exist. 



The subject area mean sentence length is 11 words with a range of 1-39 words 
per sentence. The median values range from 9-11 words, virtually identical to the 
topic means, suggesting normal distributions (i.e., the mean reflects a central 
tendency in the data). Comparing across topics there is little variation in mean 
number of sentences per word problem (SD=.51) or in mean sentence length 
(SD=. 86). 



Table 2 



Descriptive Data for Mathematics by Topic 



Statistics 


Decimals 


Fractions 


Multiplication 


Ratios 


Subject area 
to tal / mean / range 
(SD) 


Total no. of words 


1788 


1716 


1650 


1854 


7008 


Total no. of word 
problems 

Mean no. of 


57 


56 


52 


47 


212 


words per word 
problem 


31.37 


30.64 


31.73 


39.45 


33.06 (5.06) 


Mean no. of 












sentences per 
word problem 


2.95 


2.96 


3.17 


3.43 


3.13 (.51) 


Range of no. of 
sentences per 
word problem 


2-6 


2-6 


2-5 


2-7 


2-7 


Mean no. of 
words per 
sentence 


10.80 


10.40 


10.35 


11.63 


10.80 (.86) 


Median sentence 
length 


8.67 


9.50 


10.00 


11.00 


9.79 


Range of no. of 
words per 


1-30 


1-25 


1-25 


1-39 


1-39 


sentence 













Science 

Table 3 provides the descriptive data for science overall and by topic. There are 
7,261 words total in the twelve selections, with an average of 605 words and 11 
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paragraphs per selection. The subject area mean number of sentences per paragraph 
is just over 4, with a range of 1-8 sentences. The subject area mean sentence length is 
13 words with a range in sentence length of 1-37 words. The median sentence length 
across topics ranges from 12-13 words, almost identical to the means. The standard 
deviations for subject area mean paragraph and mean sentence lengths are .45 and 
.86, respectively. 

For science, the descriptive statistics show consistency within and across topics. 
However, the average length of the selections varies from 531 to 673 words, which 
directly affects the number of paragraphs and sentences in each selection. Even with 
this variation, overall statistics such as the mean number of sentences per paragraph 
are similar. This suggests relative consistency in how the selections are structured at 
the paragraph and sentence levels across topics. 
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Table 3 



Descriptive Data for Science by Topic 



Statistics 


Matter 


Plants 


Storms 


Water cycle 


Subject area 
total / mean / range 
(SD) 


Total no. of 
words 


2018 


1938 


1711 


1594 


7261 


Mean no. of 
words per 
selection 


672.67 


646.00 


570.33 


531.33 


605.08 (66.37) 


Mean no. of 
paragraphs per 
selection 


12.67 


12.33 


11.00 


9.33 


11.33 (1.78) 


Mean no. of 
sentences per 
paragraph 


3.85 


4.18 


4.27 


4.44 


4.18 (.45) 


Range of no. of 
sentences per 
paragraph 


1-8 


2-8 


1-8 


1-7 


1-8 


Mean no. of 
words per 
sentence 


13.83 


12.78 


12.28 


13.08 


12.99 (.86) 


Median sentence 
length 


13.33 


12.67 


12.33 


12.33 


12.67 


Range of no. of 
words per 
sentence 


3-34 


1-30 


3-34 


3-37 


1-37 



Social Studies 

Table 4 provides descriptive statistics for social studies overall and by topic. 
The total number of words in the social studies selections is 10,878, with an average 
of 907 words per selection. There is a subject area average of 17 paragraphs per 
selection, with a mean number of 4 sentences per paragraph and a range of 1-9 
sentences. The subject area mean sentence length for social studies selections is 14 
with a range of 3-43 words per sentence. The median sentence length across topics 
ranges from 12-14, close in value to their respective means. Comparing across topics, 
there is little variation in either subject area mean paragraph length or mean 
sentence length (SD=.48 and SD=.75 respectively). 
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Table 4 



Descriptive Data for Social Studies by Topic 



Statistics 


Declaration of 
independence 


Industrial 

revolution 


Pilgrims 


Slavery 


Subject 
total/mean/ 
range (SD) 


Total no. of 
words 


2,528 


2,704 


2,689 


2,957 


10,878 


Mean no. of 












words per 
selection 


842.67 


901.33 


896.33 


985.67 


906.50 (85.99) 


Mean no. of 












paragraphs per 
selection 


16.33 


16.67 


16.33 


19.00 


17.08 (2.35) 


Mean no. of 












sentences per 
paragraph 


3.56 


4.00 


4.40 


3.93 


3.98 (.48) 


Range of no. of 
sentences per 
paragraph 


2-8 


2-8 


1-9 


2-8 


1-9 


Mean no. of 
words per 
sentence 


14.58 


13.55 


12.56 


13.40 


13.52 (.75) 


Median sentence 
length 


14.17 


13.33 


12.00 


12.33 


12.96 


Range of no. of 
words per 


4-33 


3-41 


3-31 


3-43 


3-43 


sentence 













For social studies, the descriptive statistics show little variation within or across 
topics, except that as with science, the selections differ in average length by as many 
as 143 words per selection. Even with this variation, paragraph and sentence data 
such as the mean number of sentences per paragraph are similar across topics. 

Chapter Summary 

Overall, differences in basic descriptive data among subject areas exist, but are 
minimal. Table 5 provides comparisons of the descriptive data for subject matter 
totals in the three subject areas. 
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Table 5 



Comparison of Descriptive Data Across Subject Areas 



Statistics 


Mathematics 


Science 


Social Studies 


Mean no. of sentences per word 
problem/paragraph (SD) 


3.13 (.51) 


4.18 (.45) 


3.98 (.48) 


Range of no. of sentences per 
word problem/paragraph 


2-7 


1-8 


1-9 


Mean no. of words per sentence 
(SD) 


10.80 (.86) 


12.99 (.86) 


13.52 (.75) 


Range of no. of words per 
sentence 


1-39 


1-37 


3-43 



The mean number of sentences per word problem in mathematics is 3; for 
science and social studies, the mean is 4 sentences per paragraph. The range of 
number of sentences per paragraph for science and social studies is nearly identical 
as well. At the sentence level, the mean sentence length is slightly shorter for 
mathematics at 11 than for science at 13 and social studies at 14 with a standard 
deviation of about 1 word per sentence for all three subject areas. In treating 
mathematics word problems and science and social studies paragraphs as 
comparable units of analysis, we found that mathematics word problems were 
shorter and used shorter sentences than paragraphs and sentences in either the 
science or social studies text selections. 

We turn now to the analyses of the vocabulary, grammatical, and 
organizational features in the next three chapters, which will help to further specify 
the characteristics of these subject-area texts. 
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CHAPTER 3: 



LEXICAL ANALYSES 

Continuing our analysis of the same textbook selections described in Chapter 2, 
our goal in these lexical analyses was to examine the nature of vocabulary used in 
mathematics, science and social studies textbooks. The analyses, as already 
described in the General Methods section of Chapter 1, focus on key aspects 
believed consequential for the understanding and acquisition of vocabulary in 
academic settings. This includes determining the degree of diversity in vocabulary 
with a basic description of the number of unique words types, as well as identifying 
the academic English lexicon by word usage in academic contexts. For example, 
such contexts include the specialized use of vocabulary in academic disciplines (e.g., 
thermal, multiplication), as well as usage encountered predominantly in academic 
contexts but not exclusive to any one discipline (e.g., synthesize, report ) (Nation & 
Coxhead, 2001). 

To help specify and define what constitutes an academic word and what does 
not beyond these contextual aspects (i.e., specialized versus non-specialized uses), 
we also investigated discrete features of words in the text selections that included: 
(a) vocabulary that appears on low-frequency word lists for the fifth-grade level; (b) 
vocabulary that contains three or more syllables, and (c) vocabulary that is 
morphologically derived from root lexical forms. A synthesis of analyses is reported 
whereby we examine the degree of overlap between vocabulary meeting the low 
frequency, three or more syllable, derived, and academic vocabulary criteria. Two 
additional lexical-level analyses were conducted and are included in this chapter: 
frequency and variety of clause connectors and frequency of nominalizations. Both 
were viewed as potential hallmarks of printed texts in the academic setting. 

For each analysis we provide information about raw amount (tokens), number 
of unique words (types), percentage of the total number of tokens and types for 
which different lexical features account. We also describe variation in the results of 
these analyses by topic and across the three subjects. 5 Again, the main emphasis is 
on differences across topics and subjects rather than across individual textbooks 



5 The lexical analyses are presented at the topic level rather than aggregated immediately to the 
subject level as they are for analyses of grammar (Chapter 4) and organizational structures (Chapter 
5), since vocabulary is likely to be influenced by the choice of topic. 
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from which selections were made. Description of variation is provided 
quantitatively with calculations of standard deviations and minimum and 
maximum values, as well as qualitatively with examples of specific lexical items 
encountered in the different analyses. 

Lexical Diversity 

The lexical diversity measure shows the amount of variety in vocabulary items 
used in the different subject areas. That is, lexical diversity can be expressed as the 
number of different (i.e., unique) word types that appear in the text selections 
relative to the overall number of words used. The smaller the type /token ratio the 
less diverse the vocabulary in the selection — that is, the use of word types is 
repetitious (e.g., Phillips, 1973). 

Procedures 

To calculate lexical diversity, we used the type /token approach utilized in 
studies of language development (e.g., Templin, 1957; Sokolov & Snow, 1994). For 
such an analysis to be comparable across subjects and across topics within subjects, 
we standardized the total number of tokens (overall number of words) by using the 
first 450 words of mathematics word problems selections, or a passage selection in 
science and social studies. Prior research has shown that use of greater than 200 
tokens yields more stable ratio calculations (Richards, 1987). Using the FREQ option 
of CLAN (MacWhinney, 1995; MacWhinney & Snow, 1990), the number of different 
word types within this 450 was tallied, and the ratio of the number of types to the 
450 tokens was interpreted to provide an indication of lexical diversity in the 
textbook selections. The number of word types and the ratios of twelve selections 
(30% of the data) across the three subjects and topics were also manually calculated 
independently to check for accuracy of the CLAN program. With two exceptions the 
count of types and calculation of ratios were identical (CLAN treated the numbers 
that included a decimal point as separate words). 

Findings 

Mathematics. Table 6 provides the average number of different word types, the 
ratios by individual mathematics topics, and the overall subject-area averages for 
these values. We see that there is little variation in the average number of unique 
word types by topic (the overall subject-area standard deviation is below 10), 
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although there is some difference in how well the averages represent the central 
tendency within topic: standard deviations for the topics range from just 2 for the 
selections in Decimals to 16 for the selections in Multiplication. 

The topics all have a type /token ratio of approximately .40. That is, on average, 
slightly more than half the words in these selections appear more than once. This 
level of lexical diversity is somewhat low. In the spoken language of even relatively 
young children, type/token ratios are typically on the order of .50 (Pan, 1994). The 
lexical diversity in written language, especially in an academic context that is 
assumed to be purposefully introducing new lexical items to students would be 
expected to be greater. 

Science. Table 6 provides the average number of unique word types, the ratios 
by science topics, and the overall averages for these values. There is close to a 20- 
word difference in the topic totals for number of unique word types, suggesting a 
degree of variation across topics in lexical diversity. This is confirmed by the 
standard deviation of about 13 for the subject-area average. Matter selections had the 
fewest number of unique word types, whereas the selections for the topic of Storms 
had the greatest. Within-topic variation as measured by the standard deviation is 
also quite high, similar across topics. Again the topics all have relatively low 
type/token ratios of approximately .40. 

Social Studies. The total number of unique word types is 223 on average. Table 
6 shows that there is only a slight degree of variation around this mean (standard 
deviation is 12), with a difference of 17 words from the lowest number of unique 
word types in Industrial Revolution selections to the highest in Pilgrims selections. 
Within-topic variation also differs greatly, with standard deviations ranging from 
about 4 for the Pilgrims selections to nearly 17 for the Declaration of Independence 
selections. The type/token ratios are close to .50. In other words, these selections 
introduce students to a new word every other word on average. 

Summary. Comparing across the subjects we see that mathematics appears to 
be the most homogenous in terms of the variation in the number of unique word 
types across topic selections. According to examination of type /token ratios, neither 
mathematics nor science is as lexically diverse as social studies. Science, however, 
also presents the greatest degree of variation in the average number of unique word 
types across topic selections of any of the subject areas. The size of type /token 
ratios, in oral language at least, has been linked to the developmental stage of the 
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addressee. Specifically, Phillips (1973) found smaller ratios in child-directed speech 
than in adult-adult talk. We hypothesize, on the one hand, that ratios will increase in 
size the higher the grade level of the textbook. On the other hand, maintaining a 
topic focus may maintain lexical diversity ratios as well, although these hypotheses 
await future study. Details on how often the individual words are used is something 
we turn to next in our examination of frequencies of words by subject-area. 

Table 6 



Lexical Diversity in Three Subjects by Topic (Standardized to 450 Word Tokens) 



Statistic 




Mathematics averages (SD) 




Subject-area 


Decimals 


Fractions 


Multiplication 


Ratios 


average (SD) 




Total no. of 


195.67 


196.33 


189 


192.33 


193.33 


unique word 
types 


(2.08) 


(8.50) 


(16.46) 


(10.97) 


(9.72) 


Type/ token 


.43 


.44 


.42 


.43 


.43 


ratio 


(.00) 


(.02) 


(.04) 


(.02) 


(.02) 






Science averages (SD) 








Matter 


Plants 


Storms 


Water cycle 




Total no. of 


174.33 


178.67 


193.67 


188 


183.67 


unique word 
types 


(11.93) 


(12.86) 


(10.69) 


(14.00) 


(13.24) 


Type/ token 


.39 


.40 


.43 


.42 


.41 


ratio 


(.03) 


(.03) 


(.02) 


(.03) 


(.03) 






Social Studies averages (SD) 








Declaration of 


Industrial 


Pilgrims 


Slavery 






Independence 


Revolution 




Total no. of 


225.33 


212 


229 


224.33 


222.67 


unique word 
types 


(16.77) 


(10.82) 


(3.61) 


(12.90) 


(12.24) 


'Type / token 


.50 


.47 


.51 


.50 


.49 


ratio 


(.04) 


(.02) 


(.01) 


(.02) 


(.03) 



Academic English Vocabulary 

Chamot and O'Malley define academic language as "the language that is used 
by teachers and students for the purpose of acquiring new knowledge and 
skills... imparting new information, describing abstract ideas, and developing 
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students' conceptual understanding" (1994, p. 40). However, prior attempts to 
operationalize academic language for research and practice purposes have proven 
elusive, with some ESL teachers suggesting academic language refers to the 
functions performed in the classroom and other teachers describing academic 
language as the specialized vocabulary used in subject areas (Solomon & Rhodes, 
1995). Following Scarcella and Zimmerman (1998), we suggest that academic 
vocabulary, as one component of the broader academic language construct, 
comprises both general (e.g., evidence, demonstrate and represent) and specialized 
lexicons (e.g., diameter, condenses, and abolitionist), each of which students must 
acquire in order to become fully proficient in English in the academic setting. 
According to Nation (2001), at the tertiary education level, general academic 
vocabulary covers on average 8.5% of words in academic texts and specialized or 
technical vocabulary covers about 5% of words in academic texts (see also Bailey & 
Butler, 2003, 2004; Martin, 1976; Nation & Coxhead, 2001; Stevens, Butler & 
Castellon-Wellington, 2000 for reviews of academic vocabulary). 

Procedures 

The primary distinction in the coding schema developed by the research team 
for academic language analyses was between academic and non-academic usage of 
words (i.e., general service vocabulary. West, 1953). Phrases and compound words 
were coded as a single unit (i.e., the separate words in the phrase or compound were 
not counted as individual words). Within academic usage we also distinguished 
between specialized academic words and general academic vocabulary that cuts 
across disciplines. This distinction is important for future test development efforts 
that will attempt to target a broad cross-section of academic language and may 
therefore need to treat specialized vocabulary separately. During the course of 
coding we considered the sense intended in the selections and rated only the form 
and usage of words in the context of the given selections. Specifically, we had to be 
sure that the word-sense intended in the passage was referring to an academic 
concept (e.g., "Determine the centrifugal force" versus "Don't force him to do it"). 
This includes the specialized word sense often used in mathematics for some of the 
most common words in English. Prepositions for instance take on very precise and 
often unfamiliar usage in the mathematics register (e.g., Pimm, 1987, 1995; Bailey, in 
press). For example, the preposition "in" used in a phrase like "three in four" makes 
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the relationship between the two numbers proportional, in this case three quarters 
or 75%. 

If the intended meaning of a word was the same both in and out of the 
classroom setting, we considered the word usage to be non-academic. Throughout 
this process we were careful to avoid equating unfamiliarity with academic 
vocabulary. While we did not classify the use of seemingly arbitrary proper names 
(such as Peter and Anne ) most often found in mathematics word problems as 
academic vocabulary, proper names related to content learning or crucial to the 
academic concepts of a topic were rated as specialized (e.g.. Continental Congress, 
Newton, and John Adams). Similarly, measurement vocabulary and abbreviations for 
measurement and formulas were also rated as a subcategory of specialized academic 
vocabulary (e.g., kilometer, km, and km/hr). Colloquialisms /idiomatic expressions 
(e.g., doffer, half-joked, and twister), and verbatim speech formed their own separate 
subcategories if used in service of conveying academic concepts. 

Initially, two researchers reviewed the definitions of academic language in the 
coding guidelines and identified examples in sample texts independently. Ratings 
on these selections were compared for agreement in two ways: (a) simple agreement 
between coders on distinguishing academic vocabulary from non-academic 
vocabulary, regardless of subcategory; and (b) simple agreement between coders on 
distinguishing between specialized and general academic vocabulary. 
Disagreements were discussed with the entire research team and the coding schema 
further refined to remove ambiguities in coding decisions. 

Once the two coders reached 80% agreement on sample texts, they began the 
full coding of the selections independently. Reliability between the two coders was 
calculated on 12 selections, or a third of the data, in each subject area. These 
selections were chosen at intervals across the entire period of coding to ensure that 
fidelity to the coding guidelines was maintained. Simple agreement for 
distinguishing academic vocabulary from non-academic vocabulary averaged .96 
(range .95-.99) for mathematics, .96 (range .95-. 98) for science, and .92 (range .91-. 93) 
for social studies. Simple agreement for distinguishing specialized academic 
vocabulary from general academic vocabulary averaged .84 (.74-.90) for 

mathematics, .94 (range .76-1.0) for science, and .91 (range .81-.97) for social studies. 

The average number of words identified as academic vocabulary across 
selections is provided for each of the topics in the three subjects. The raw numbers of 
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academic words and the subtypes we identified are also expressed as percentages of 
all words in the selections by topic. In addition, we also report the total number of 
unique word types among academic English words and the percentage of total word 
types for which these account in each topic. 

Findings 

Mathematics. The average number of words identified as academic vocabulary 
per mathematics selection is 60, or about 10% of the total number of words in the 
mathematics subject area (see Table 7). 
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Table 7 



Academic Vocabulary in Mathematics by Topic [Percentage of Total Words in Brackets] 



Statistic 




Topic averages 




Subject-area 


Decimals 


Fractions 


Multiplication 


Ratios 


average (SD) 


Total no. of all 


55 


42 


62.33 


81 


60.08 (21.25) 


academic word tokens 


[9.21] 


[7.61] 


[11.22] 


[13.15] 


[10.30] 


Total no. of all 


30 


28.67 


39.67 


43 


35.33 (12.12) 


academic word types 


[11.76] 


[11.75] 


[16.12] 


[16.42] 


[14.01] 


Total no. of general 


18.67 


17.33 


16.67 


21.33 


18.50 (6.27) 


academic word tokens 


[3.10] 


[3.13] 


[2.99] 


[3.51] 


[3.18] 


Total no. of general 


13.33 


10.67 


12.67 


12.33 


12.25 (5.91) 


academic word types 


[5.39] 


[4.39] 


[5.07] 


[4.75] 


[4.90] 


Total no. of 
specialized word 


10.67 


13 


27 


33.67 


21.08 (13.19) 


tokens 


[1.74] 


[2.40] 


[4.90] 


[5.45] 


[3.62] 


Total no. of 
specialized word 


10 


13.33 


20.33 


23.33 


16.75 (10.05) 


types 


[3.69] 


[5.45] 


[8.37] 


[8.85] 


[6.59] 


Total no. of 
measurement word 


25 


11.67 


17.67 


22.67 


19.25 (8.71) 


tokens 


[4.25] 


[2.08] 


[3.16] 


[3.63] 


[3.28] 


Total no. of 
measurement word 


6.33 


4.67 


6.33 


5.67 


5.75 (1.71) 


types 


[2.55] 


[1.92] 


[2.55] 


[2.17] 


[2.30] 



Note. A very small number of colloquialisms (7 tokens) and proper nouns (8 tokens) were employed 
in the text selections, though not across all topic areas. These amounted to fewer than 1 word on 
average across the topics. 



There is an average of 35 different word types, accounting for 14% of the 
different word types in mathematics overall. In total there are 275 different academic 
word types. There is some variation across topics in the magnitude of the standard 
deviations; these are about one third of the subject-area averages for both word 
tokens and types. Of the academic words, just 18 on average were identified as 
general academic words (e.g., resulted, officially, reasonable), with 12 of these 
identified as different word types. There is little variation in these numbers across 
topics. 
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On average, there are slightly more specialized academic words and 
measurement words than general academic words in mathematics selections. Such 
words include denominator, rectangular, remainder and centigrade, Karat, kilometer for 
specialized academic and measurement words, respectively. The number of 
different measurement word types, however, is small, accounting for just 2% of all 
word types in mathematics due to repetition of common measurement words such 
as meter. Specialized academic words, in contrast, account for nearly 7% of all word 
types. Table 6 also shows that there is considerable variation across mathematics 
topics both in terms of the average number of specialized academic words and 
measurement words used and in the number of different types used in the 
selections. Standard deviations for specialized word tokens and types especially, are 
in excess of half the mean values suggesting some topics, such as Decimals, have far 
less specialized vocabulary than others, namely Multiplication and Ratios. 

Science. The average number of words identified as academic vocabulary in 
the science selections is 131, or about 21% of the total number of words in the science 
subject-area (see Table 8). Selections averaged 60 different word types, accounting 
for 27% of the different word types in mathematics overall. In total there were 561 
different academic word types identified. The standard deviation for word tokens is 
relatively low, although variation in the number of different types across topics was 
higher, with the magnitude of the standard deviation approaching half the subject- 
area average. Just 38 of these words on average were identified as general academic 
words, with 24 of these identified as different word types. There is little variation in 
these numbers across topics. 

There are slightly more specialized words in science selections than any other 
category of academic vocabulary. Such words include anther, convection, and 
evaporation. The number of different specialized academic word types is relatively 
modest at just 30 different word types accounting for 14% of all word types in 
science. Table 8 also shows that there is considerable variation across science topics 
in the average number of specialized academic words used and in the number of 
different types employed in the selections. The magnitude of the standard 
deviations is more than one third the value of the means, suggesting that some 
topics, such as Plants, contain far more specialized academic vocabulary than others. 
While the topics in the science subject area also include measurement words, 
colloquialisms, and proper nouns, these categories are rare, and the standard 
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deviations are larger than the mean values, suggesting a large degree of 
heterogeneity across topics. 
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Table 8 



Academic Vocabulary in Science by Topic [Percentage of Total Words in Brackets] 



Cfo fi cti p — 




Topic averages 




Subject-area 




Matter 


Plants 


Storms 


Water cycle 


average (SD) 


Total no. of all 


153.33 


166.33 


99.33 


105.33 


130.83 (31.71) 


academic word 
tokens 


[22.73] 


[25.75] 


[17.46] 


[19.86] 


[21.45] 


Total no. of all 


63 


71.33 


51.33 


54.33 


60.00 (12.68) 


academic word 
types 


[27.53] 


[30.26] 


[25.73] 


[25.73] 


[26.69] 


Total no. of 


51.33 


36.33 


29.33 


33 


37.50 (10.53) 


general academic 
word tokens 


[7.67] 


[5.61] 


[5.15] 


[6.33] 


[6.19] 


Total no. of 


29.67 


24.33 


20.33 


23 


24.33 (4.10) 


general academic 
word types 


[13.02] 


[10.37] 


[9.18] 


[11.13] 


[10.93] 


Total no. of 


87 


129.33 


53.67 


67.67 


84.42 (31.13) 


specialized word 
tokens 


[12.98] 


[20.94] 


[9.41] 


[12.61] 


[13.83] 


Total no. of 


26.33 


46 


21 


29.67 


30.75 (12.56) 


specialized word 
types 


[11.49] 


[19.48] 


[9.54] 


[13.83] 


[13.58] 


Total no. of 


13 


0 


9 


1.33 


5.83 (7.47) 


measurement 
word tokens 


[1.92] 


[0] 


[1.64] 


[.24] 


[.95] 


Total no. of 


6.33 


0 


3.33 


.67 


2.58 (3.63) 


measurement 
word types 


[2.72] 


[0] 


[1.56] 


[.31] 


[1.15] 


Total no. of 


.33 


.67 


1 


.67 


.58 (.89) 


colloquialism 

tokens a 


[.05] 


[TO] 


[.17] 


[.12] 


[TO] 


Total no. of 


.33 


.67 


1 


.67 


.67 (.89) 


colloquialism 

types 3 


[.15] 


[.26] 


[.43] 


[.28] 


[.28] 


Total no. of 


.67 


.33 


6.33 


2.67 


2.5 (3.50) 


proper noun 
tokens 3 


[.11] 


[.05] 


[1.09] 


[.56] 


[.45] 


Total no. of 


.33 


.33 


5.67 


.33 


1.67 (2.77) 


proper noun 
types 3 


[.15] 


[.15] 


[2.52] 


[.19] 


[.75] 
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a These subcategories are found in the selections and the words in them are not automatically 
considered specialized or general academic vocabulary but must meet the academic vocabulary 
definition provided above.Table 9 

Academic Words in Social Studies by Topic [Percentage of Total Words in Brackets] 



Topic averages „ , . , 

r ° Subject-area 



Statistic 


Declaration of 
Independence 


Industrial 

Revolution 


Pilgrims 


Slavery 


J 

average (SD) 


Total no. of all 


185 


193 


191.33 


141.67 


177.75 (39.67) 


academic word 
tokens 


[29.97] 


[19.51] 


[25.75] 


[19.83] 


[23.76] 


Total no. of all 


111 


73.33 


99.67 


83 


91.75 (30.67) 


academic word 
types 


[30.13] 


[19.60] 


[25.92] 


[19.83] 


[23.87] 


Total no. of general 


43 


27.33 


18.33 


28.67 


29.33 (17.73) 


academic word 
tokens 


[5] 


[3.05] 


[2.12] 


[3.04] 


[3.30] 


Total no. of general 


38.67 


23 


16 


24 


25.42 (15.69) 


academic word 
types 


[10.47] 


[6.06] 


[4.17] 


[5.75] 


[6.61] 


Total no. of 


72.67 


107 


70 


78.33 


82.00 (19.48) 


specialized word 
tokens 


[8.55] 


[12.02] 


[7.87] 


[7.99] 


[9.11] 


Total no. of 


38.67 


46.67 


45 


36.33 


41.67 (11.56) 


specialized word 
types 


[10.47] 


[12.57] 


[11.68] 


[8.67] 


[10.85] 


Total no. of 


2.33 


1.33 


1.67 


.67 


3.00 (3.22) 


colloquialism 

tokens a 


[.26] 


[.15] 


[.19] 


[.70] 


[.33] 


Total no. of 


2.33 


1.33 


1.67 


6 


2.83 (2.92) 


colloquialism 

types a 


[.61] 


[.36] 


[.44] 


[1.44] 


[.71] 


Total no. of proper 


66.33 


56 


100.67 


27.33 


62.58 (30.83) 


noun tokens a 


[7.78] 


[6.22] 


[11.27] 


[2.83] 


[7.03] 


Total no. of proper 


30.67 


1 


36.33 


16 


21.00 (15.46) 


noun types a 


[8.42] 


[.26] 


[9.46] 


[3.82] 


[5.49] 



a These words are not automatically considered specialized or general academic vocabulary but must 
meet the academic vocabulary definition provided above. A very small number of measurement 
words (5 tokens) and instances of verbatim speech (5 tokens) appear in the text selections but not 
across all topic areas. These amounted to fewer than 1 word on average across topics. 

Social Studies. The average number of words identified as academic 
vocabulary in the social studies selections is 178 or about 24% on average of the total 
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number of words in the social studies subject area (see Table 9). There is an average 
of 92 different word types per topic, accounting for 24% of the different word types 
in social studies. In total there were 875 different academic word types identified. 
The standard deviations for word tokens and types are relatively low, suggesting 
little variation in the number of words and different words types across topics. On 
average, just 29, or 3%, of all words in social science selections were identified as 
general academic words, with 25 of these identified as different word types. 
Variation in these numbers across topics is relatively high, with the standard 
deviations in excess of half the mean values for both general academic word tokens 
and types. The topic Pilgrims has 18 general academic word tokens (16 different 
types) on average across selections, whereas Declaration of Independence contains 43 
such words on average (39 different types). 

Table 9 shows that there are far more specialized academic words (82) than 
general academic words in social studies on average, and these account for 9% of all 
word tokens and 10% of all word types in this subject area. Such words include 
abolitionist, landowner, and merchant. There is little variation in these numbers across 
topics. The topics in the social studies subject area also include colloquialisms and 
proper nouns. The former are rare, and the standard deviations are larger than the 
mean values, suggesting an enormous degree of idiosyncrasy across topics. Proper 
nouns are more prevalent with 63 word tokens on average (21 different word types), 
accounting for 7% of all word tokens in social studies. 

Summary. Comparing across the three subjects, we see that mathematics 
selections contain fewer academic words as a proportion of all words in the subject 
area. Science and social studies have comparable percentages of academic words. 
Breaking the academic vocabulary down further into subcategories, we find that all 
subject areas have proportionately more specialized vocabulary than any other 
category of words. Mathematics also contains a relatively large number of 
measurement words that rival specialized words in both number and proportion of 
all words in the mathematics selections. Neither science nor social studies contain 
many words in other categories, with the exception of social studies containing a 
relatively high proportion of proper nouns, such as Congress, Lowell, and Thomas 
Jefferson. Few words were found in common across the three subject areas. Just 15 
word types out of a possible 275 word types were found in the text selections across 
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mathematics, science and social studies. 6 These words included nouns like population 
and products, the adjective equal, measurement words such as pound and km, verbs 
such as continued, express, produce and suppose and the phrase in order to (see 
Appendix B for full list). 

Overall Word Frequency Counts and Low-Frequency Words 

Word frequency is simply the number of times a word occurred in a text (i.e., 
the number of tokens or instances). For this study, low frequency is operationalized 
as less than or equal to 10 occurrences per million words in a published fifth-grade 
level lexical frequency corpus (Zeno et al., 1995). 

Procedures 

Using the FREQ option of CLAN, we generated word lists that provide the 
count of how frequently each unique word occurred (see Appendix C for word lists 
of the 20 most frequently used words in mathematics, science, and social studies 
selections). We examine these word frequency counts separately by subject, 
reporting the 20 most frequently occurring words, the proportion of all words for 
which these account, the proportion of all words for which the number of words 
occurring only once in a given subject matter accounts, and provide examples of 
these least used words in the text selections. 

In further analyses, three researchers coded selections for words that met the 
low-frequency criterion based on the corpus used by Zeno et al. (1995). The total 
number of words that were identified as occurring with low frequency is provided 
for each of the topics across subjects. This number is also expressed as a percentage 
of all the words in the selections for each topic. In addition, we also report the total 
number of unique word types among these low-frequency lexical items, and provide 
the percentage of total word types these low-frequency words account for in each 
topic. Accuracy checks were periodically made on 12 selections (a third of the data) 
to ensure that the raters identified words as low frequency according to the 
reference corpus at a desired minimum of 95% correctly identified. Agreement 
between two researchers ranged from 91% to 100%, averaging 99%. 



6 The greatest number of words that could overlap across the three subject areas is in fact the lowest 
number of academic words identified in any of the three subject areas — mathematics. 
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Findings 

Mathematics. The most commonly used words (i.e., the 20 most used words in 
the mathematics selections) account for 2271 (34%) of the total words used in the 
mathematics selections overall. The definite determiner the, among the most 
commonly occurring words in the English language (see for example Francis & 
Kucera, 1982), accounts for 410 instances, almost twice as many tokens as the next 
most commonly used word — the preposition of with 258 tokens. Function words 
such as the indefinite determiner (i.e., a) and common prepositions (e.g., to, in, for), 
and common forms of the copula verb "to be" (e.g., is) account for frequency counts 
in the 100s. The remaining words in the top 20 count range from 83 tokens (each) to 
just 45 tokens (if), and include two personal pronouns (e.g., he, she) and words that 
are semantically related to quantity (e.g., much, many). There were 571 words (8%) 
that occurred just once in the mathematics selections. Many of these were irrelevant 
to the mathematics construct (i.e., names of fictitious or real people and places used 
to provide context for the word problems). A few others appear to be mathematics 
content words (e.g., gain, mode, factor), that, due to their infrequent use, may not be 
widely used at the fifth-grade level or pertain to topic areas not directly selected for 
analysis. 

Overall, there are very few words coded as low-frequency status in any of the 
mathematics selections. Table 10 shows that on average such words only account for 
about 8% of the total word tokens in the mathematics selections. However, if these 
low-frequency words are unfamiliar to the reader, comprehending the text will pose 
a challenge. The standard deviations associated with the subject-area averages for 
types and tokens are moderate, suggesting little variation across the topics in 
mathematics. 

Science. The most commonly used words (i.e., the 20 most used words in the 
science selections) account for 2544 (35%) of the total words used in the science 
selections overall. Again, the definite determiner the accounts for many (585) 
instances, more than twice as many tokens as the next most commonly used word, 
which was the indefinite determiner a with 244 tokens. Remaining counts ranged 
from 236 to 45. While many of the remaining words in the top 20 count are function 
words (e.g., in, to, from), the science selections also employed substantive or content 
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words in high numbers such as water (144 tokens) and mass (45 tokens). 7 In addition, 
there are a number of different verb forms in high usage (e.g., is, was, are, can), 
variety in personal pronouns (e.g., you, it), a demonstrative pronoun (i.e., this), and 
high usage of the conjuncts and and or. There are 618 words (8.5%) that occurred just 
once in the science selections. Many appear to be science content words (e.g., 
adaptive, capacity, organ). However, due to their infrequent occurrence these words 
may be related to topics that were not directly selected for analysis or are not widely 
taught at the fifth-grade level. 



7 The use of content words stands in contrast with function words in the linguistic sense rather than to 
mean content words such as the subject-area specialized terminology of mathematics, science or 
social studies. 
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Table 10 



Low Frequency Words in Three Subjects by Topic [Percentage of Total Words in Brackets] 



Qlutictir 1 — 




Mathematics averages (SD) 




Subject-area 


DldllbllL 


Decimals 


Fractions 


Multiplication 


Ratios 


average (SD) 


Total no. of low 
freq. Word 
tokens 


44.33 

(7.30) 


49.67 

(8.69) 


39 

(7.11) 


47.67 

(7.72) 


45.17 (9.68) 
[7.70] 


Total no. of low 
freq. Word 
types 


31 

(12.15) 


32 

(13.09) 


26 

(10.79) 


29.33 

(11.21) 


29.58 (5.43) 
[11.81] 






Science averages (SD) 








Matter 


Plants 


Storms 


Water cycle 




Total no. of low 
freq. Word 
tokens 


41 

(6.09) 


76 

(11.84) 


47.67 

(8.31) 


33.33 

(6.24) 


49.50 (21.16) 
[8.12] 


Total no. of low 
freq. Word 
types 


26.67 

(11.61) 


35.67 

(15.23) 


26.00 

(11.69) 


23.67 

(11.07) 


28 (7.76) 
[12.40] 






Social Studies averages (SD) 








Declaration of 
Independence 


Industrial 

Revolution 


Pilgrims 


Slavery 




Total no. of low 
freq. Word 
tokens 


62 

(7.36) 


79.67 

(8.84) 


73.67 

(8.22) 


91 

(9.23) 


76.58 (21.39) 
[8.42] 


Total no. of low 
freq. Word 
types 


44.33 

(12.33) 


47.33 

(12.43) 


39.67 

(10.32) 


55.67 

(13.28) 


46.75 (11.77) 
[12.01] 



There is great variability in the use of low-frequency designated words across 
the topic selections for science as captured in the standard deviation of 21, almost 
half the value of the subject-area average. This is logical as words vary considerably 
according to topic and as words can be mentioned just once in an effort to exemplify 
a point (a common discourse-level feature we discuss in Chapter 5); such one-time 
usage is likely a hallmark of conceptually dense prose. Table 10 shows a 43-word 
difference between the number of low-frequency word tokens in Water Cycle 
selections and Plant selections. In terms of the percentage of all words these low- 
frequency lexical items account for, this varies from a low of 6% to a high of nearly 
12%. It is interesting to note that while Plants has far more low-frequency words, it 
repeats these same words more often than the other topic areas. 
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Social Studies. The most commonly used words (i.e., the 20 most used words 
in the social studies selections) account for 3484 (32%) of the total words used in the 
social studies selections overall. Again, the definite determiner the accounts for 
many instances (857), nearly three times as many tokens as the next most commonly 
used word, the preposition to with 355 tokens. Remaining counts range from 286 to 
just 53. Many of these words are prepositions (e.g., of, in, for, on, by, from), personal 
or possessive pronominal forms (e.g., they, he, their), and the demonstrative pronoun 
that. The conjuncts and and or occur in the top 20 count. Just one substantive content 
word people is among the 20 most frequently used words (63 tokens). There are 966 
words (9%) that occur just once in the social studies selections. Many of these are 
proper names for real people, places, and events. Others appear to be social studies 
content words (e.g., articles, government, opinion), that due to their infrequent use 
may not be widely used in social studies at the fifth-grade level or else they pertain 
to topic areas not directly selected for analyses. 

There is some variability in the use of low-frequency words across the topic 
selections for social studies. The standard deviation at 21 is approximately a quarter 
of the value of the subject-area average. Table 10 shows a 29-word difference 
between the number of low-frequency words in Slavery selections and Declaration of 
Independence selections. The percentage of all words for which these low frequency 
lexical items account, remains relatively consistent at about 8% (+/- 1%). 

Summary. Comparing across mathematics, science and social studies, we see 
that the most frequently used words account for just over 30% of all word tokens in 
each of the three subjects. The three subjects share seven words in common among 
the twenty most frequently used in each subject area. These are composed of the 
determiners the and a, the prepositions of, to, in, on and the conjunct and. There are 
differences, however, in the semantic composition of the twenty most frequent 
words across the three subject areas. Specifically there is greater variety in lexical 
usage and greater inclusion of content words alongside more commonly occurring 
function words in the science and social studies selections than in the mathematics 
selections. 

We also see considerable variability in the textbook selections in the 
percentages of words that are designated as low-frequency words. Averages for 
mathematics and science are relatively similar with low-frequency words in both 
subject areas accounting for about 8% of the total word count for these subjects 
overall. Approximately 30 of these words are used at least once on average in both 
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subjects, and this accounts for about 12% of all the word types in either mathematics 
or science. 

However, whereas mathematics is extremely uniform across topics, one science 
topic area is exceedingly elevated in the number of low-frequency words compared 
to the other three topic areas. While social studies has about 50% more words 
identified as low frequency on average than the other two subjects in absolute terms, 
this raw number accounts for only a fraction of a percent of the total word count in 
social studies (less than 1 percent on average), as well as a fraction of the different 
word types in social studies (just over 1 percent on average). Like science, social 
studies also has one topic area that had many more low-frequency words than the 
other three topics. The research suggests the relatively minor numeric role of low- 
frequency words in the total word counts for all selections, but the relatively greater 
role of low-frequency words in terms of total number of different word types in 
mathematics and science. 



Three-or-More-Syllable Words 

Multisyllabic words are often more difficult and less common and have 
typically been used as an index of difficulty in readability measures (e.g. Flesch, 
1948; Klare, 1974; Zakaluk & Samuels, 1988). The American Heritage Dictionary 
defines a syllable as "a unit of spoken language consisting of a single uninterrupted 
sound formed by a vowel, diphthong, or syllabic consonant alone, or by any of these 
sounds preceded, followed, or surrounded by one or more consonants" (4 th edition, 
2000 ). 

Procedures 

Three researchers coded the selections. Proper nouns and abbreviations were 
included in the analyses, but numerals or symbols were not. During coding, we 
considered how many syllables abbreviations symbolized. Whenever researchers 
disagreed on the number of syllables in a given word, a dictionary was consulted. 
Accuracy checks were periodically made on one third of the data to ensure that the 
raters identified words with three or more syllables (according to the dictionary) at a 
desired minimum of 95% correctly identified. Agreement between two researchers 
ranged from 99.7% to 100%, averaging 99.9%. 
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The analyses provide the total number of words that were identified as 
multisyllabic words with three or more syllables for each of the topics across the 
subject- areas. The number of these words is also expressed as a percentage of all 
words in the topic selections. In addition, we also report the total number of unique 
word types among these 3-or-more syllable words and the percentage of total word 
types for which these account in each topic. 

Findings 

Mathematics. Very few words contain three or more syllables in mathematics 
(see Table 11). These words account for only 6% of all word tokens on average and 
for slightly more (9%) of all word types. There is some variation across topics with a 
17-word difference from the highest to lowest amount of 3-or-more-syllable words 
contained in the text selections; the standard deviation of 13 is greater than a third of 
the value of the subject-area average for the number of word tokens. 
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Table 11 



Three-or-More-Syllable Words in Three Subjects by Topic [Percentage of Total Words in Brackets] 



Statistic 




Mathematics averages (SD) 




Subject-area 


Decimals 


Fractions 


Multiplication 


Ratios 


average (SD) 


Total no. of > 3 
syllable word 
tokens 


37 

(6.16) 


28.67 

(5.18) 


26.67 

(4.81) 


44 

(7.12) 


34.08 (13.39) 
[5.80] 


Total no. of > 3 
syllable word 
types 


26.67 

(10.45) 


19.67 

(8.01) 


20 

(8.17) 


24 

(9.20) 


22.58 (6.42) 
[8.95] 






Science averages (SD) 








Matter 


Plants 


Storms 


Water cycle 




Total no. of > 3 
syllable word 
tokens 


61.67 

(9.14) 


62 

(9.63) 


64.33 

(11.29) 


43 

(8.06) 


57.75 (11.28) 
[9.53] 


Total no. of > 3 
syllable word 
types 


36 

(15.75) 


37.67 

(16.05) 


32.67 

(14.71) 


27 

(12.58) 


33.33 (7.24) 
[14.77] 






Social Studies averages (SD) 








Declaration of 
Independence 


Industrial 

Revolution 


Pilgrims 


Slavery 




Total no. of > 3 
syllable word 
tokens 


123.33 

(14.55) 


96.67 

(10.72) 


114.67 

(12.79) 


99.33 

(10.08) 


108.50 (17.75) 
[11.97] 


Total no. of > 3 
syllable word 
types 


64.33 

(17.52) 


54.00 

(14.19) 


65.67 

(17.09) 


65.00 

(15.50) 


62.25 (11.62) 
[16.13] 



Science. Table 11 shows that 10% of words in the science selections are 3-or- 
more-syllable-words on average. One topic, namely the Water Cycle, has far fewer 
such words than the other three topic areas, which depresses the mean value to 
some degree. Otherwise, the variation across topics is not great (standard deviations 
are moderate in relation to the respective means). In terms of word types, three-or- 
more-syllable words account for about 15% of all word types on average. 

Social Studies. Words with three or more syllables account for 12% of words in 
social studies text selections on average (see Table 11). There is a 27-word difference 
in the number of these multisyllabic words across topic areas. However, given the 
large number of multisyllabic words within each of the topics, this amount of 
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variation is relatively small; the standard deviation at 18 is less than one-fifth the 
value of the overall average for number of tokens. Multisyllabic words account for 
about 16% of all word types in social studies. 

Summary. When we compare the three subjects, we see that mathematics has 
far fewer multisyllabic words with 3 or more syllables than either science or social 
studies as a percentage of the total number of words in the selections. In social 
studies more than 16% of all word types are accounted for by words with three or 
more syllables on average. There is some variability across topics within all of the 
subject areas, although this is more a hallmark of mathematics than of science or 
social studies. Implications for test development suggest a relatively important role 
for three-or-more-syllable words in the word count of typical selections in science 
and social studies, but less so in the case of mathematics. 

Derived Words 

Derived words are formally and semantically more complex than their root 
forms and constitute much of the new vocabulary that native English-speaking 
students learn in the upper elementary grades (Anglin, 1993). Derived words are 
defined here as words that have changed grammatical category (part of speech) by 
adding an affix (typically a suffix such as -ation, -ly, -ance, -ness, -ity, etc.). 

Procedures 

Three researchers coded the selections by identifying a word and confirming its 
classification by examining the affix. It was imperative that each identified word 
could exist independently as a word without the affix. Included in the analyses were 
present and past participles functioning as modifiers as well as gerunds. However, 
we did not incorporate words that do not change grammatical category, nor did we 
include compound words, words with derivational relationships that may be direct 
or obscure, forms with no added morphology, or possessive forms. Reliability 
between coders was calculated using two selections from each of the three subject- 
areas. The number of derived words in these six selections ranged from 476 to 984. 
Simple agreement in rating these samples was .99 for both samples in mathematics 
and science, and .98 and .99 for the two selections respectively in social studies. 

First the total number of words that were identified as morphologically derived 
from root forms is provided for each of the topics across the subjects. The number of 



44 




derived words is also expressed as a percentage of words in the topic selections. In 
addition, we also report the total number of unique word types among these 
derived words and the percentage of total word types these account for in each 
topic. 

Findings 

Mathematics. Table 12 shows that there are very few derived words in the 
mathematics selections, although there is variation across topics; the standard 
deviation of 7 is large at half the value of the overall average. The Ratios topic has 
more derived words than the other three topics. On average, derived words account 
for just 2% of all word tokens and about 4% of all word types in mathematics. 

Science. On average, 35 words in the science selections are derived from root 
forms and account for about 6% of all words in the science selections (see Table 12). 
These words account for about 11% of the total number of different word types on 
average. The standard deviations are moderate, suggesting the results are relatively 
uniform across the individual topic selections. 

Social Studies. From Table 12, we see that the number of derived word tokens 
varies somewhat by topic with the standard deviation at 24 being greater than a 
third of the value of the overall subject-area average. There is a 45-word difference 
between the topic with the lowest number of derived words ( Pilgrims ) and the topic 
with the highest number (Slavery). On average, derived words account for 8% of all 
words in social science selections. Derived word types account for 12% of all word 
types in social studies on average. 
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Table 12 



Derived Words in Three Subjects by Topic [Percentage of Total Words in Brackets] 



Statistic 




Mathematics averages (SD) 




Subject-area 


Decimals 


Fractions 


Multiplication 


Ratios 


average (SD) 




Total no. of 
derived word 


9.67 


14.67 


9.33 


19.67 


13.33 (7.10) 


tokens 


(2.62) 


(2.71) 


(1.68) 


(3.18) 


[2.31] 


Total no. of 
derived word 


7.67 


10 


6.33 


15.33 


9.83 (4.78) 


types 


(3.01) 


(4.09) 


(2.55) 


(5.87) 


[3.88] 






Science averages (SD) 








Matter 


Plants 


Storms 


Water cycle 




Total no. of 
derived word 


38.33 


39.67 


30.67 


31 


34.92 (9.82) 


tokens 


(5.67) 


(6.18) 


(5.35) 


(4.46) 


[5.76] 


Total no. of 
derived word 


26.67 


25.33 


22.67 


22.33 


24.25 (5.33) 


types 


(11.66) 


(10.86) 


(10.19) 


(10.53) 


[10.81] 






Social Studies averages (SD) 








Declaration of 


Industrial 


Pilgrims 


Slavery 






Independence 


Revolution 




Total no. of 
derived word 


69.67 


81.00 


49.00 


94.33 


73.50 (24.13) 


tokens 


(8.31) 


(9.10) 


(5.47) 


(9.57) 


[8.11] 


Total no. of 
derived word 


42.33 


51.33 


39.67 


56.00 


47.33 (9.89) 


types 


(11.91) 


(13.56) 


(10.32) 


(13.35) 


[12.26] 



Summary. Comparing across the three subjects, we see that mathematics 
selections contain few derived words. Science and social studies have modest 
amounts more. Science is homogeneous across topics, whereas mathematics and 
social studies each have topic areas with differing numbers of derived words. These 
findings have implications for test development that suggest derived words play 
only a minor role in mathematics selections and slightly larger roles for the other 
two subject areas at the fifth-grade level. However, analysis of later grade levels may 
provide a different picture. 



46 




Comparison Across Lexical Analyses 

The purpose of comparing the vocabulary lists generated in the analysis of 
academic vocabulary and the analyses of three key lexical features already examined 
informs how we can most effectively select words for test development purposes. If 
academic vocabulary can be predicted from one or a combination of the three key 
lexical features, we may more efficiently identify vocabulary for assessment than if 
we apply a more laborious and potentially subjective coding schema of academic 
language criteria. 

Procedures 

Using CLAN programs on combinations of word lists for academic vocabulary, 
low-frequency words, words with 3 or more syllables, and derived words, we 
calculated the number of academic word types that also appeared in the three lexical 
categories lists as, (a) a proportion of all words in the three lexical categories 
individually, and (b) as a proportion of all academic words. These were calculated 
separately for the three subject areas. 

We then compared the four word lists for academic vocabulary, low-frequency 
words, 3-or-more syllable words, and derived words to tally the number of word 
types that occurred uniquely (i.e., no overlap across lists), occurred twice (i.e., 
overlapped 50% by appearing on two word lists only), occurred three times (i.e., 
overlapped 75% by appearing on three word lists only), and finally occurred four 
times (i.e., overlapped 100% by appearing on all four word lists). Again, we 
calculated this overlap in word lists separately by subject area. 

Findings 

Individual Predictions of Academic Vocabulary. Table 13 shows that in 
mathematics selections only 20% of low-frequency words at the fifth grade 
according to criterion by Zeno et al (1995) were also identified by raters as academic 
vocabulary. This percentage increases to just above 30% of 3-or-more-syllable words 
or derived words identified as academic vocabulary. In science and social studies, 
far more low-frequency words were identified as academic vocabulary than in 
mathematics. Similar results were obtained with 3-or-more-syllable words. A total of 
81% of derived words in science were also identified as academic vocabulary. This 
percentage is only slightly lower in social studies. 
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Table 13 



Proportion of Low Frequency Word, 3-or-More-Syllable-Word, and Derived 
Word Types Identified as Academic Word Types by Subject 



Subject 


Low freq. 


3-or-more-syllables 


Derived 


Math 


.20 


.34 


.33 


Science 


.61 


.52 


.81 


Social Studies 


.58 


.54 


.64 



Next, looking at the number of words identified as low frequency, 
multisyllabic, or derived as proportions of all academic vocabulary identified, we 
see strong subject-area effects. Table 14 shows mathematics displaying less 
association between the academic English words identified and the words identified 
in the three other analyses. 



Table 14 



Proportion of Academic Vocabulary Accounted for by Low Frequency Word, 3-or- 
More-Syllable-Word, and Derived Word Types by Subject 





Low freq 


3-or-more-syllables 


Derived 


Math 


.11 


.10 


.16 


Science 


.23 


.17 


.28 


Social Studies 


.23 


.20 


.29 



The three categories account for between 10-16% of all academic words only. 
Science and social studies pattern similarly, but account for larger proportions of 
academic vocabulary. Derived words in particular account for nearly 30% of all 
words that were identified as academic vocabulary. While this is promising for 
selecting a sizable number of words for test development purposes, it means of 
course that 70% of all words identified as academic in science and social studies may 
not be derived, and even fewer are identifiable as low frequency or consisting of 3 or 
more syllables. 

Degree of Concordance Across Lexical Analyses. Another manner of looking 
at the degree of predictability of academic language from lexical features of words 
also suggests that we will miss many words identified as academic vocabulary if we 
rely on these analyses alone or even in combination. Table 15 shows that in 
mathematics texts the majority (72%) of word tokens identified as academic English 
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vocabulary had none of the features of the three lexical analyses we also performed 
on words in the selections. Only 20% shared one other lexical feature we had 
analyzed, either low frequency, 3 or more syllables, or derived form. Even fewer 
shared two or more features, and just 1% of words identified as academic 
vocabulary in the mathematics selections overlapped with all three of the lexical 
features. Science and social studies fared somewhat better and again were very 
similar in pattern. Just over 50% of academic English words shared none of the three 
lexical features we also analyzed (conversely, approximately 45% showed at least 
one lexical feature), about 25% shared one feature, about 15% shared two, and 5% of 
the words shared all three lexical features. 

Table 15 



Number of Academic Words Appearing in Any Combination with Three Lexical Features by Subject 





No overlap 


50% overlap 
(any 1 feature) 


75% overlap 
(any 2 features) 


100% overlap 
(any 3 features) 


Math 


468 [72] 


129 [20] 


40 [09] 


9 [1] 


Science 


430 [56] 


193 [25] 


101 [13] 


39 [5] 


Social Studies 


697 [55] 


335 [26] 


169 [15] 


64 [5] 



Note. Percentages may not sum to 1.0 due to rounding. [Percentage of Words Within Subject Area in 
Brackets] 



Implications of these research findings for test development suggest that the 
vagaries of what makes a word "academic" (especially for mathematics) make it 
largely impossible to predict using combinations of lexical features or even one 
feature alone. Although, if we were forced to choose one such lexical feature, 
derived words would be the most promising for all three subjects, science and social 
studies in particular. 



Additional Lexical Analyses 

Two additional areas of analysis were included to complement the description 
of vocabulary usage in the text selections. While both the number and variety of 
clause connectors and the utilization of nominalizations were relatively few in these 
fifth-grade texts across all subjects, they are included here in the development of a 
comprehensive method for text analyses because of the potential for these features 
to be more elaborate in the texts of later grades. 
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Clause Connectors. We focused on two types of connectors in our analysis: 
adverbial dependent clause connectors (e.g., because, when, if) and coordinate clause 
connectors (e.g., and, but, or, nor). Adverbial clause connectors introduce a finite 
adverbial clause and signal the relationship between that clause and the rest of the 
sentence. Coordinate clause connectors, on the other hand, join two independent 
clauses occurring within a single sentence. Meaning relationships between clauses 
are frequently signaled by clause connectors, which occur frequently in written 
expository texts. Previous research suggests that students may not interpret 
adverbial clause connectors correctly and therefore misunderstand the meaning 
relationships they encode (e.g., Celce-Murcia & Larsen-Freeman, 1983; Halliday & 
Hasan, 1976). 

Procedures 

Adverbial dependent clause connectors were identified for each subject area 
(see Appendix D for a list of the clause connectors identified in the selections). For 
each adverbial dependent and coordinate clause connector, we calculated the raw 
frequency, and the percentage of the total number of words each constituted for the 
subject areas. For the adverbial dependent clause connectors we also calculated the 
rate of tokens and types per 100 sentences and per 1000 words. 

Reliability was calculated by two researchers. Percentage of initial agreement is 
the percentage agreed upon independently, whereas consensus agreement is 
percentage agreed upon after discussion of discrepancies. The average percentage of 
initial agreement was 85.6%, and the interrater consensus agreement was 100%. 

Findings 

Mathematics. We identified a total of 7 adverbial clause connectors in the 
dependent clauses and an additional 4 connectors used in coordinate clauses (see 
Chapter 4 for a description of the total number and type of clauses). The frequency 
counts are a total of 56 instances for the dependent clause connectors and 24 for 
coordinate clause connectors. The most frequent adverbial clause connector is the 
connector if (70% of the total number of all adverbial connectors), followed by when 
(13%). Frequent coordinate clause connectors include and (50% of the total number 
of coordinate connectors) and but (42%). 
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Science. We identified 15 adverbial clause connectors used in dependent 
clauses and 2 connectors in the coordinate clauses. The frequency counts are 102 
instances for the dependent clause connectors and 11 for the coordinate clause 
connectors. The most frequent adverbial connectors are when (27%), because (15%), if 
(10%), and as (27%). The only frequent coordinate connector is and (91%). 

Social Studies. We identified 17 different adverbial clause connectors used in 
dependent clauses and 4 connectors in coordinate clauses. The frequency counts are 
a total of 78 instances for dependent clause connectors and 28 for coordinate clause 
connectors. The most frequent adverbial clause connectors are when (23%), as (19%) 
and after (12%). Frequent coordinate clause connectors include but (57%) and and 
(36%). 

Table 16 



Adverbial Dependent Clause Connectors in Three Subjects 



Statistic 


Math 


Science 


Social Studies 


No. of tokens 


56 


101 


78 


No. of tokens per 
100 sentences 


8.56 a 


17.84 


9.61 


No. of tokens per 
1000 words 


7.99 b 


13.91 


7.17 


No. of types 


7 


15 


17 


No. of types per 
100 sentences 


1.07 


2.65 


2.09 


No. of types per 
1000 words 


1.00 


2.07 


1.56 



a To derive this number, we divide the number of word tokens (types) by the number of 
sentences identified in all math selections. We then multiply the number by 100 and 
round it to two decimal places. In this case, the number of word tokens is 56, and the 
number of sentences in math is 654: (54 / 654) *1 00=8.56 

b To derive this number, we divide the number of word tokens (types) by the number of 
words identified in all math selections. We then multiply the number by 1000 and 
round it to two decimal places. In this case, the number of word tokens is 56, and the 
number of sentences in math is 7008: (56/7008)*1000=7.99 

Summary. Using rate data to compare across subject areas, we see in Table 16 
that science has more unique types of adverbial dependent clause connectors per 100 
sentences and 1000 words than either mathematics or social studies. Science 
selections also employ adverbial dependent clause connectors far more often than 
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either mathematics or social studies. Although to keep this in perspective even in 
science selections, this is only one adverbial clause connector every sixth sentence or 
every 72 words. 

Taking both adverbial dependent clause connectors and coordinate clause 
connectors into consideration, we found that all three subject areas have some 
frequently occurring connectors in common, including: and, when, if, and hut. The 
most frequent dependent clause connectors are if and when, whereas the most 
frequent coordinate clause connector is and. 

Nominalizations. Nominalizations encode actions and processes (e.g., verb 
forms), states and notions (e.g., adjectives), or circumstances (e.g., adverbs) as nouns 
or noun phrases (e.g., stratification, abstractness, and quickness). They condense 
information within academic texts and thus increase the processing load for 
students (Gibbons, 1998; Martin, 1991; Schleppegrell, 2004). Nominalizations are 
utilized most frequently in academic and technical prose and are a hallmark of 
advanced literacy (Schleppegrell, 2004). Despite the relatively low frequency of 
nominalization in fifth-grade texts, the analysis is included for the purposes of 
developing a comprehensive method of text analyses for use in later grade levels. 

Procedures 

For each selection, we identified all the nominalizations and counted the 
number of tokens and types. We then calculated the proportion of nominalizations 
per selection as a percentage of the word tokens and types, the averages for each 
topic and subject area, and the rate of tokens and types per 100 sentences and per 
1000 words. Reliability was calculated by two researchers independently. Initial 
agreement between two researchers ranged from 92.8% to 99.1%, and the consensus 
interrater agreement was 100%. 

Findings 

Mathematics. Table 17 provides the results of the analyses of nominalizations 
(e.g., enlargement) in mathematics. The average number of nominalizations is fairly 
small, with a range of 2 to 5 nominalizations identified per topic. There are only 2 
unique nominalizations each in Decimals, Fractions, and Multiplication, and 4 in 
Ratios. Across topics, there is an average frequency of 4 nominalizations per 
selection, with an average number of 3 unique nominalizations. The average 
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proportion of nominalization tokens and types per selection is less than 1% of tokens 
and types across topics. 

Table 17 



Use of Nominalizations in Three Subjects by Topic [Percentage of Total Words in Brackets] 



Statistic 




Mathematics averages 




Subject-area 


Decimals 


Fractions 


Multiplication 


Ratios 


average (SD) 




Total no. of 
nominalizatio 


2 


4 


3 


5 


3.58 (2.84) 


ns (tokens) 


[0.33] 


[0.76] 


[0.59] 


[0.83] 


[0.63] 


Total no. of 
nominalizatio 


2 


2 


2 


4 


2.42 (1.68) 


ns (types) 


[0.63] 


[0.82] 


[0.78] 


[1.55] 


[0.95] 






Science 


averages 








Matter 


Plants 


Storms 


Water Cycle 




Total no. of 
nominalizatio 


15 


17 


10 


9 


12.58 (7.03) 


ns (tokens) 


[0.02] 


[2.60] 


[1.65] 


[1.84] 


[2.06] 


Total no. of 
nominalizatio 


9 


9 


6 


6 


7.58 (3.80) 


ns (types) 


[3.96] 


[3.99] 


[2.52] 


[3.20] 


[3.42] 






Social Studies averages 








Declaration of 


Industrial 


Pilgrims 


Slavery 






Independence 


Revolution 




Total no. of 
nominalizatio 


40 


16 


9 


21 


21.42 (12.62) 


ns (tokens) 


[4.74] 


[1.74] 


[1.03] 


[2.12] 


[2.41] 


Total no. of 
nominalizatio 


17 


8 


7 


11 


10.92 (5.12) 


ns (types) 


[4.76] 


[2.17] 


[1.74] 


[2.70] 


[2.84] 



Science. Table 17 provides the data on nominalizations (e.g., heaviness ) in 
science. On average, 13 nominalizations (tokens) were identified per selection across 
topics, with a range of 9 to 17 nominalizations per selection. There is a range of 6 to 9 
unique nominalizations (types) in each selection per topic. Average percentages of 
both nominalization types and tokens in selections are small (2% and 3%, 
respectively), and variation across topics is minimal. 
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Social Studies. Table 17 also provides the data on nominalizations (e.g., 
strengthened) in social studies. An average of 21 nominalizations (tokens) and 11 
unique types were identified per selection across topics. There is some variation 
across topics in the frequency of occurrence, ranging from 9 nominalizations 
(tokens) in Pilgrims to 40 in Declaration of Independence. There is a range of 7 to 17 
unique types of nominalization per selection across topics. Overall, the average 
percentage of unique types of nominalization in selections is small (3% across 
topics), with most of the variation occurring in Declaration of Independence, which has 
the largest proportion of nominalizations. 



Table 18 



Nominalizations in Three Subjects 



Statistic 


Math 


Science 


Social Studies 


No. of tokens 


43 


151 


257 


No. of tokens per 
100 sentences 


6.57 


26.68 


31.65 


No. of tokens per 
1000 words 


6.14 


20.80 


23.63 


No. of types 


19 


61 


80 


No. of types per 
100 sentences 


2.91 


10.78 


9.85 


No. of types per 
1000 words 


2.71 


8.40 


7.35 



Summary. Although there are very few nominalizations in the selections 
analyzed, there is some variation in the amount identified across the subject areas. 
Comparing the proportion of nominalizations across subjects, the average 
frequencies of nominalizations per selection are small (63% in mathematics, 2.06% in 
science, and 2.41% in social studies). There is more variation across topics in social 
studies than in mathematics and science. Using rate data to compare across subject 
areas, we see in Table 18 that science and social studies pattern similarly to one 
another in as much as they have far more nominalizations per 100 sentences and 
1000 words than mathematics (approximately one nominalization every third 
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sentence or every 48 words). Science and social studies text selections also have 
more unique types of nominalization than mathematics per 100 sentences and 1000 
words. 



Chapter Summary 

In general, the mathematics texts we analyzed here seem quite different from 
both science and social studies texts; social studies texts seem the most complex in 
terms of the vocabulary identified with complex features (i.e., more demanding 
vocabulary). The findings are relatively uniform across topics within subject areas 
for most lexical features. Derivational forms of words identify academic vocabulary 
more systematically than any other lexical feature we examined here, especially so 
for the areas of science and social studies, at least at the fifth-grade level. There was 
highly repetitive use of a small set of non-academic words found across all three 
subject-areas (e.g., prepositions, determiners, and conjunctions). However, there 
were relatively few similarities in general academic vocabulary used across the 
subject-areas (just 15 academic words in common). Striking also is the infrequent use 
of general academic vocabulary relative to specialized academic vocabulary in any 
of the three subjects areas (e.g., just 3% of all word tokens in social studies are 
general academic words compared with 9% for specialized academic words). This is 
in contrast with the tertiary education level findings reported in Nation (2001) that 
suggest general academic words are used more frequently than specialized words. It 
is possible that the inconsistencies in results are due to developmental differences 
whereby far more words are considered specialized academic words due to the 
more encompassing "generalist" nature of the subject areas at the elementary school 
level than at the college level. 8 



8 However, in more recent work, Chung and Nation (2003) coded tertiary level texts much as we 
coded the fifth-grade texts here and found that specialized words accounted for as much as 31% of all 
words in technical fields such as anatomy and 21% of all words in specialty fields such as applied 
linguistics (general academic vocabulary was identified by the Academic Word List [Coxhead, 2000] 
and accounted for 4% and 7% of words in these texts, respectively). 
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CHAPTER 4: 



GRAMMATICAL ANALYSES 

An important component of the profiles created for test development is the 
characterization of grammatical features of the texts. Thus, grammatical analyses 
were performed on the selections for the purpose of providing basic descriptive 
information for the text profiles. The analyses selected reflect multiple 
considerations, including the need to describe, at the most basic level, the sentence 
types in a particular subject area as well as the features that may differ according to 
grade level or subject area. We investigated the following grammatical features: (a) 
sentence types (e.g., simple, compound, complex, and compound /complex), (b) 
dependent clauses, (c) passive constructions, (d) prepositional phrases, (e) noun 
phrases, and (f) participial modifiers. While this is not an exhaustive list of 
grammatical features, some of them represent known sources of reading difficulty 
for English learners (e.g., Celce-Murcia and Larsen-Freeman, 1983) and may also 
help to characterize academic texts (e.g., Schleppegrell, 2001, 2004); if so, they should 
be taken into consideration in creating tests of academic text comprehension. A 
discussion of the analyses for each of the features follows in turn below. 

Sentence Types and Clauses 

The range of sentence types and the complexity of clausal structures may 
contribute to increases in the reading difficulty of academic texts for students as they 
progress through the grades. Thus, our goal in the current research is to gather data 
that helps characterize fifth-grade texts across the three subjects and also to ascertain 
what features of textbook language are most critical for students at this grade level. 
Ultimately these data will be compared to similar data across grade levels to identify 
changes in grammatical features as the language of textbooks becomes more 
complex. 

Procedures 

We classified sentences into four categories: simple, compound, complex, and 
compound/ complex. A simple sentence contains one independent clause (e.g.. The 
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ice on the river melts quickly under the warm sun.). 9 A compound sentence contains two 
or more independent clauses joined by a conjunction (e.g., and, but) or a punctuation 
mark (e.g., semicolon, colon). The following sentence is an example of a compound 
sentence joined by the conjunction but: Some deserts are very hot in the daytime, but 
temperatures can drop below freezing at night. After the first independent clause in a 
sentence, all subsequent independent clauses were counted as coordinate clauses. A 
complex sentence contains one or more dependent clauses (typically defined as 
clauses that cannot stand alone) in addition to an independent (i.e., main) clause 
(e.g.. Although human beings don't notice the noises of nature, a lot of animals react to the 
sounds around them.). A compound /complex sentence contains one or more 
dependent clauses and two or more independent clauses (e.g., Washington soon 
realized that the Nation was not functioning well, so he became an advocate in the movement 
leading to the Constitutional Convention.). For the purposes of this study, we identified 
the following as instances of dependent clauses: relative (adjectival) clauses (e.g.. The 
number that he found was less than 10.); adverbial clauses of time, circumstance, 
manner, purpose, and condition (e.g.. He counted them as he put them away .); and 
clausal structures functioning as subject, object, predicate nominal, or object of 
preposition (e.g.. He wrote about how they had survived .). 10 

Next we calculated the percentage of the total number of sentences in each 
selection that fell into each of the four categories. We also counted the number of 
dependent clauses and the number of coordinate clauses. To determine the total 
number of clauses for the selection, we added the number of dependent clauses, 
coordinate clauses, and sentences (i.e., main clauses). We then calculated the 
percentage of total clauses that were dependent and the percentage that were 
coordinate. In addition, for all sentence and clause types, we calculated the average, 
range, and standard deviation at the topic and subject area levels. 

Accuracy checks were performed at intervals on a total of 15% of the data 
analyzed to assure consistency in ratings, with a target accuracy rate of 90%. The 
overall accuracy rate across subject areas for rating dependent clauses was 86%, with 
a range of 66% to 95.5% within subjects. Interrater agreement for mathematics was 
low, initially in the range of 67% to 82%, so an additional check was performed 



9 Examples were created by the authors to model the features identified in the textbook selections, 
unless otherwise indicated. 

10 Since the English language contains a continuum of structures from lexical to fully clausal, 
principled distinctions between clausal structures and phrases are not always obvious. 
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during the analyses, resulting in average agreement for mathematics of 80%, with a 
range of 66% to 92%. The results of the analyses follow below. 

Findings 

The results of the sentence and clause type analyses are presented by subject 
(mathematics, science, and social studies), followed by a summary of the findings 
across subject areas. Where relevant, variation at the topic level is discussed. 
(Appendices E and F provide topic level data for sentence and clause types, 
including aggregate counts and averages for topics within each content area.) 

Mathematics. Tables 19 and 20 provide the number and percent of types of 
sentences and clauses in the mathematics selections. 

Table 19 



Sentence Types in Mathematics Selections 





Total 

Number 


% of Total 


Simple 


531 


81.19 


Complex 


111 


16.97 


Compound 


10 


1.53 


Compound/complex 


2 


.31 


Total 


654 


100.00 



Table 20 

Clause Types in Mathematics Selections 




Total 






Number 


% of Total 


Main clauses 


654 


71.71 


Dependent clauses 


242 


26.54 


Co-ordinate clauses 


16 


1.75 


Total 


912 


100.00 
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Simple sentences comprise the majority of all sentences in mathematics. 
Specifically, 81% are simple sentences; 17% are complex sentences; 2% are 
compound sentences; and less than 1% are compound /complex sentences. Among 
all the clauses identified, approximately 26% are dependent clauses and only 2% are 
coordinate clauses. 

Science. Tables 21 and 22 provide the number of sentences and clauses in the 
science selections. 



Table 21 



Sentence Types in Science Selections 





Total 

Number 


% of Total 


Simple 


351 


62.01 


Complex 


200 


35.34 


Compound 


10 


1.77 


Compound/complex 


5 


.88 


Total 


566 


100.00 



Table 22 

Clause Types in Science Selections 




Total 






Number 


% of Total 


Main clauses 


566 


58.69 


Dependent clauses 


242 


29.37 


Co-ordinate clauses 


16 


1.94 


Total 


824 


100.00 



As in mathematics, the majority of sentences in science are simple sentences 
(62%), followed by complex (35%). Compound and compound /complex sentences 
constitute only 2% and 1% of all sentences, respectively. Across topics, there is slight 
variation in the distribution of simple sentences, with the average percentages of 
simple sentences ranging from 58% to 68%. 
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Social Studies. Tables 23 and 24 provide the sentence type and clause data for 
social studies. 



Table 23 



Sentence Types in Social Studies Selections 





Total 

Number 


% of Total 


Simple 


518 


64.02 


Complex 


258 


31.89 


Compound 


23 


2.84 


Compound / complex 


10 


1.24 


Total 


809 


100.00 



Table 24 

Clause Types in Social Studies Selections 




Total 






Number 


% of Total 


Main clauses 


812 


68.41 


Dependent clauses 


339 


28.56 


Co-ordinate clauses 


36 


3.03 


All clause types 


1187 


100.00 



Again, as with mathematics and science, simple sentences make up the 
majority of sentences in social studies: 64% are simple sentences, 32% are complex 
sentences, 3% are compound sentences, and 1% are compound/ complex sentences. 
The distribution of simple sentences varies slightly by topic, with the percentage of 
simple sentences as a proportion of all sentences ranging from 48% to 71%. 11 The 
mean number of clauses across topics in science is 99; approximately 28% are 



H One topic contained many more complex sentences than the other three topics ( Declaration of 
Independence contains an almost identical number of simple [48.41%] and complex sentences 
[46.81%]), indicating that there may be some variability according to topic in terms of sentence 
complexity. (Appendices E and F provide topic level data for sentence and clause types, including 
aggregate counts and averages for topics within each content area.) 
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dependent clauses and 3% are coordinate clauses, with variation in the average 
number of dependent clauses across topics. 12 

Summary. The data show that across the three subjects at the fifth-grade level 
the majority of sentences for the selections analyzed are simple, with a mean of 81% 
of all sentences in mathematics being simple sentences, 62% in science, and 64% in 
social studies. Seventeen percent of all sentences in mathematics, 35% in science, and 
32% in social studies are complex sentences. Compound and compound/complex 
sentences constitute only a small proportion in all subjects, ranging from 0 to 3% per 
selection on average. 

There is some variation in the composition of sentence types across subjects. 
The ratio of simple to complex sentences is nearly 5:1 in the mathematics selections, 
but it is less than 2:1 in science and social studies. Additionally, there are slightly 
more compound and compound/complex sentences in social studies than in 
mathematics and science, although the proportions of both sentence types are small 
overall. 

Across subject areas, there are many more dependent clauses than coordinate 
clauses. The distribution of both clause types is fairly uniform across topics in all 
three subjects. The mean proportion of dependent clauses as a percentage of total 
clauses ranges from 27% to 29%, while the proportion of coordinate clauses is only 
2% to 3%. 



Use of Passive Verb Forms 

The use of passive voice verb forms in textbooks is typically thought to 
contribute to reading difficulty, especially for English learners who may not have 
comparable constructions in their own first languages (Celce-Murcia & Larsen- 
Freeman, 1983). Passive voice verb forms occur less frequently than their active 
counterparts in English (Biber, 1988). To create an empirical basis for comparisons 
across grade levels and subject areas, we noted the frequency of passive verb forms. 
Specifically, we noted the frequency of two types of passives, those including the 
agent in a "by" phrase (e.g.. Water is absorbed by the ground and becomes groundwater.), 
and those without an overt agent (e.g.. Large icebergs are found at the ice shelves of 
Antarctica.) Because passives without an agent appear more frequently in both 



12 Specifically, the mean number of dependent clauses across topics ranged from 22 in Industrial 
Revolution to 39 in Declaration of Independence. 
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speech and writing in English (Shintani, 1979), we wanted to determine if this bias 
for agentless passive is identifiable in fifth-grade texts. 

Procedures 

We identified the passive voice verb forms in all selections and calculated the 
number per 100 sentences and per 1000 words. We then noted the number of 
passives that include a "by" phrase and calculated the mean number per 100 
sentences and per 1000 words. 

Accuracy checks were carried out at intervals throughout the analyses on a 
total of 15% of the data coded, with a target agreement rate of 90% or higher. Mean 
overall agreement was 90%, with a range of 75% to 100% within subjects. The results 
of the analyses are presented below. 

Findings 

The results of the analyses of passive voice verb forms are presented by subject 
area, followed by a summary comparing the results across subjects. Table 25 
provides the number of passives in mathematics, science, and social studies 
selections. 

Table 25 



Passive Verb Forms in Three Subjects 







Number 


No. per 100 Sentences 


No. per 1000 Words 


Mathematics 


All passives 




28 


4 


4 


Passives with 


"by" phrases 


0 


0 


0 


Science 


All passives 




134 


24 


18.5 


Passives with 


"by" phrases 


19 


3 


2.6 


Social Studies 


All passives 




131 


16 


12.0 


Passives with 


"by" phrases 


13 


2 


1.2 
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Passive voice verb forms appear infrequently in the mathematics selections 
analyzed, with 4 passives per 100 sentences and 4 passives per 1000 words. In other 
words, a passive is used once every 25 th sentence or every 250 words on average. No 
passives with "by" phrases were found in the mathematics selections. 

In the fifth grade science textbooks analyzed in this study, passive voice verb 
forms occur more frequently than in mathematics, with an average of 24 occurrences 
per 100 sentences, and 19 occurrences per 1000 words. That is, every fourth sentence 
contains a passive or every 54 words on average. Only 14% of the passives included 
"by" phrases. 13 

In social studies selections, passive voice verb forms occurred 16 times per 100 
sentences on average, and 12 times per 1000 words. On average, one passive is used 
in every sixth sentence or in every 83 words. Ten percent of the passives included 
"by" phrases. 14 

Summary. The number of passive voice verb forms in the selections analyzed 
for this research varies. Mathematics selections contain the least, with an average of 
4 passives per 1000 words; science has the most, with 19 per 1000 words, and social 
studies has 12 per 1000 words. No passive voice verb forms with "by" phrases were 
identified in the mathematics selections; very few were identified in science (3 per 
100 sentences) and social studies (2 per 100 sentences). The small number of passives 
identified in mathematics compared to science and social studies may be a function 
of the type of text analyzed, since mathematics selections are composed of numerous 
short word problems while science and social studies selections are extended texts. 
Our research indicates that passive voice verb forms are more prevalent in fifth- 
grade texts in science and social studies than in mathematics, but the overall 
frequency in those two subject areas is not great. The predominant form across 
subjects is the agentless passive; passives containing "by" phrases are infrequent at 
this grade level. 



13 While our focus here is on the data aggregated by the subject area, we note that there is some 
variability across topics in science. The topic Storms contains fewer passive voice verb forms than any 
other topic (an average number of 6 passive voice verb forms per selection as opposed to the highest 
topic average of 14.33 in Matter). There were no "by" phrases in Matter and Storms topic selections 
(see Appendix G for topic averages for passives in the three subject areas). 

14 The topic Pilgrims has the largest proportion of passive voice verb forms, containing "by" (7 out of 
26, or 27%), with the other three topics containing less than 10% each. 
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Prepositional Phrases 



Prepositional phrases contribute to the length and complexity of sentences. A 
prepositional phrase consists of a preposition followed by a noun phrase, as in the 
sentence: Our planet looks like a beautiful big blue marble [from a distance .] In the current 
research, we investigated the frequency and length of prepositional phrases to 
determine typical usage in fifth-grade textbooks. This information will help us 
develop specific test specification guidelines for testing student ability to understand 
prepositional phrases of average length for a given content area. 

Procedures 

For each selection, we identified and counted the prepositional phrases, and we 
calculated the average number per 100 sentences and per 1000 words. We calculated 
the number of words in prepositional phrases, the mean, the range, and the standard 
deviation. 

Interrater reliability was calculated at intervals throughout the analyses on a 
total of 15% of the data coded, with a target agreement rate of 90%. Across subjects, 
overall average agreement was 91%, with a range of 85% to 96%. 

Findings 

The results of the analyses of prepositional phrases in the text selections are 
presented below by subject area, followed by a summary in which the results are 
compared across subjects. Table 26 provides the data for prepositional phrases in the 
mathematics, science, and social studies selections. 

The mathematics selections contained 716 prepositional phrases, for an average 
of 110 per 100 sentences and 102 per 1000 words. In other words, prepositional 
phrases occur in every sentence and once in every 10 words on average. Of the total 
number of words in the math selections, 37% are in prepositional phrases. 15 The 
mean number of words per prepositional phrase is 3.58; the range is 2-14, and the 
standard deviation for individual selections ranges from .82 to 1.86. 



15 The topic Ratios contains a far higher percentage of words in prepositional phrases (51%) than the 
other three topics, which contain 29%, 31%, and 35% respectively (see Appendix H for topic averages 
for prepositional phrases in the three subject areas). 
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Table 26 



Prepositional Phrases in Three Subjects 





Number 


No. per 100 Sentences 


No. per 1000 Words 


Mathematics 


Prepositional phrases 


716 


109.5 


102.2 


Words in prepositional phrases 


2601 


397.7 


371.1 


Science 


Prepositional phrases 


784 


138.5 


108.0 


Words in prepositional phrases 


3185 


562.7 


438.6 


Social Studies 


Prepositional phrases 


1167 


143.7 


107.3 


Words in prepositional phrases 


4367 


537.8 


401.5 



The science selections contain 784 prepositional phrases, an average of 139 per 
100 sentences and 108 per 1000 words. That is, on average such phrases occur more 
than once per sentence and once in every 9 words. Forty-four percent of all words in 
the science selections are in prepositional phrases. The mean number of words per 
prepositional phrase is 4.09; the range is 2-17, and the standard deviation for 
individual selections ranges from 1.51 to 3.90. 

There were 1167 prepositional phrases in social studies selections, an average 
of 144 per 100 sentences and 107 per 1000 words. In other words, prepositional 
phrases occur nearly twice in a single sentence and once in every 9 words on 
average. Forty percent of the words in the selections are contained in prepositional 
phrases. The mean number of words per prepositional phrase is 3.74, with a range of 
2-20 and a standard deviation for individual selections ranging from 1.69 to 2.77. 

Summary. The average number of prepositional phrases per 1000 words is 
similar across content areas, ranging from 102 for mathematics to 107 for social 
studies and 108 for science. Of the total number of words in the content area 
selections, social studies has the greatest percentage, 44%, in prepositional phrases. 
The mean number of words per prepositional phrase is similar across content areas, 
with 3.6 for mathematics and 3.7 for social studies, and slightly higher, 4.0, for 
science. Social Studies and science selections contain the most prepositional phrases 
per 100 sentences (144 and 139, respectively); mathematics contains only 110 per 100 
sentences. 
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Noun Phrases 



A noun phrase consists of a noun plus its modifiers, either before (e.g., a large, 
spiraling storm system ) or after (e.g., water inside the tube; various components that make 
up our globe ) the noun. Long noun phrases may contribute to difficulty in academic 
texts (Halliday and Martin, 1993). However, little is known about the incidence or 
length of noun phrases from grade level to grade level or from one content area to 
another. To investigate, we looked at the frequency and length of noun phrases in 
each selection, as described below. 

Procedures 

First, we identified and counted the number of noun phrases in each selection. 
Then we calculated the average number per 100 sentences and per 1000 words. 
Finally, we counted the number of words in noun phrases, the mean, the range, and 
the standard deviation. 16 

As with the other analyses, reliability was calculated on approximately 15% of 
the samples rated with a target of 90% agreement. Across subjects, the average rate 
of agreement was 94%, with a range of 91% to 98%. 

Findings 

The results of the analyses of noun phrases are presented by subject below, 
followed by a summary in which the results are compared across subjects. Table 27 
provides the noun phrase data for the mathematics, science, and social studies 
selections. 

Mathematics selections contain 2031 noun phrases, with an average of 311 per 
100 sentences and 290 per 1000 words. That is, there are 3 noun phrases in every 
sentence or one in every 3 words on average. Words in noun phrases are 69% of the 
total sample. The noun phrases range in length from 1 to 16 words across topics, 
with a mean length of 2 words; for individual selections, the standard deviations 
range from 1.01 to 2.56 (see Appendix I for topic averages for noun phrases in the 
three subjects). 



16 To avoid inflating our noun phrase count, we did not count constituent noun phrases separately; 
water inside the tube was counted as 1 noun phrase containing 4 words. 
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Table 27 



Noun Phrases in Three Subjects 





Number 


No. per 100 Sentences 


No. per 1000 Words 


Mathematics 


Noun phrases 


2031 


310.6 


289.8 


Words in noun phrases 


4839 


740.0 


690.5 


Science 


Noun phrases 


1633 


288.5 


224.9 


Words in noun phrases 


4809 


849.6 


662.3 


Social Studies 


Noun phrases 


2619 


322.5 


240.8 


Words in noun phrases 


7043 


867.4 


647.5 



We found 1633 noun phrases in science selections, with 289 per 100 sentences 
and 225 per 1000 words. In other words, 3 noun phrases occur in every sentence or 
one occurs once in every 4 words on average. Of the total number of words in 
science selections, 66% occur in noun phrases. The noun phrases range in length 
from 1 to 23 words across topics, with a mean length of 3 words; standard deviations 
for individual selections range from 1.87 to 3.42. 

There are 2619 noun phrases in social studies selections, with 323 per 100 
sentences and 241 per 1000 words. The number of words in noun phrases as a 
percentage of total number of words is 65%. Noun phrases range in length from 1 to 
19 words across topics, with a mean length of 3 words. Standard deviations for 
individual selections range from 1.77 to 2.79. These phrases also occur 3 times in 
every sentence or once in every 4 words on average. 

Summary. Noun phrases range in length from a mean of 2.4 words in 
mathematics to 2.7 in social studies and 2.9 words in science. Although the 
mathematics texts contain the shortest noun phrases on average, they contain the 
most noun phrases per 1000 words (290) and the highest percentage of words in 
noun phrases (69%). 



Participial Modifiers 

Participial modifiers employ verb forms to modify nouns, contributing to 
syntactic complexity and a higher number of propositions in noun phrases. As with 
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other grammatical features investigated in this research, little empirical evidence 
exists to indicate at which grade level participial modifiers may begin to play a more 
prominent role in student texts and whether there are subject area differences in 
usage. 

A participial modifier is defined here as a participial verb form used to modify 
a noun. For example, in the sentence "The number of enslaved people in the colonies 
reached 12,000,” enslaved is a pre-nominal past participle. In the sentence "Read the 
following excerpt,” the word follozving functions as a pre-nominal present participle. 
Examples of sentences with post-nominal participles include: 1) The planters owned 
over half of the people held in slavery (post-nominal past participle), and 2) The number 
surviving was very small (post-nominal present participle). Similar in function to a 
nonrestrictive relative clause, a nonrestrictive participial modifier may precede or 
follow the noun it modifies (e.g.. The fugitives, frightened by the noise, fled.) 

Procedures 

In each selection, we first identified and classified participial modifiers as past 
or present participles. Each set was further classified into three categories: pre- 
nominal, post-nominal, or non-restrictive. We then calculated the frequency of 
participial modifiers in general, as well as the frequencies of each type and sub-type. 
For each type and sub-type, we calculated the average frequency per sentence and 
the frequency as a percentage of the total number of words in the selection. Next we 
calculated the average number per 100 sentences and per 1000 words. Finally, we 
calculated the mean, standard deviation, range, and the subject area totals and 
averages. 

Accuracy checks were carried out at intervals throughout the analyses on a 
total of 15% of the data coded, with a target accuracy rate of 90%. The average 
overall accuracy rate was 94%, with a range of 83% to 100% across subjects. The 
results of the analyses are presented below. 

Findings 

Although we conducted a detailed analysis of participial modifiers, the 
numbers on average per selection are so small that the totals for all types and sub- 
types of modifiers are collapsed and presented in this report. Thus, the totals for 
present and past participial modifiers each represent the sum of the present and past 
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pre-nominal, post-nominal, and nonrestrictive participial modifiers. Table 28 
provides the participial modifier data for mathematics, science, and social studies 
selections. 



Table 28 



Participial Modifiers in Three Subjects 





Number 


No. per 100 Sentences 


No. per 1000 Words 


Mathematics 


Present participial modifiers 


7 


1.07 


1.00 


Past participial modifiers 


12 


1.83 


1.71 


Total 


19 


2.9 


2.71 


Science 


Present participial modifiers 


39 


6.89 


5.37 


Past participial modifiers 


53 


9.36 


7.29 


Total 


92 


16.25 


12.67 


Social Studies 


Present participial modifiers 


52 


6.65 


4.78 


Past participial modifiers 


89 


10.96 


8.18 


Total 


141 


17.36 


12.93 



The total number of participial modifiers in all mathematics selections is 19. 
Present participial modifiers occur only once per 100 sentences or per 1000 words, 
on average; past participial modifiers occur at the rate of just under 2, on average. 
The area averages and the topic averages for frequency and percentage data are too 
small to warrant individual interpretation (see Appendix J for all subject areas). 

There were 92 participial modifiers in the science selections; about 58% of them 
were past participles, occurring 9 times per 100 sentences and 7 times per 1000 
words. That is, there is one modifier in every 11 sentences or in every 137 words, on 
average. The present participles were slightly less frequent. 

Of the 141 participial modifiers in social studies selections, 89 are past 
participles constituting about 63% of the total. There are about 11 past participles per 
100 sentences, and 8 per 1000 words; That is, on average there is one such modifier 
in every 9 sentences or in every 122 words. The averages for present participles are 7 
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and 5, respectively, or one modifier in every 15 sentences or every 209 words on 
average. 

Summary. There is a discrepancy between mathematics and the other content 
areas in the frequency of participial modifiers. The average number of participial 
modifiers per 1000 words is 13 in science and social studies texts, but only 3 in 
mathematics texts. There are more pre-nominal than post-nominal participial 
modifiers in science and social studies selections; in mathematics selections the 
number of post-nominal modifiers is greater, but the total number of participials is 
lower. The actual numbers of these structures is low in all three content areas at this 
grade level compared to, for example, prepositional phrase modifiers. If further 
investigation shows participial modifiers to be a more salient feature of secondary 
school academic texts, the data here will provide a basis for comparison with texts at 
higher grade levels. 



Chapter Summary 

The majority of sentences across all three subject areas were simple sentences 
followed by a smaller number of complex sentences. Mathematics texts contained 
the highest percentage of simple sentences. There were very few complex syntactic 
constructions such as use of passive voice verb forms and participial modifiers in 
any of the text selections, although compared with mathematics texts, texts in 
science and social studies contained more of these grammatical features. All three 
subject areas had comparable numbers of prepositional phrases and noun phrases. 
On average 1 and 3 per sentence, respectively. 
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CHAPTER 5: 



ORGANIZATION OF DISCOURSE 

Discourse can be analyzed on multiple levels. In prior research, discourse has 
been analyzed according to (a) language functions used in conversations (e.g., 
Halliday, 1973; Halliday & Hasan, 1976; Short, 1993), (b) language functions used in 
classroom discourse and in texts (e.g., Chamot & O'Malley, 1994; Kinsella, 1997; 
Stevens et al., 2000; Bailey et al., 2004; Butler et al., 2004), and (c) ways text structures 
convey different types of information (e.g.. Short, 1993; Vacca & Vacca, 1996). In the 
current analyses, we drew from Vacca and Vacca (1996), analyzing the 
organizational features of text at three levels of subordination, recognizing that there 
may be finer distinctions to be made within and among these levels. Our intention 
was to capture the author's purpose for writing, as well as to characterize the 
author's presentation of main ideas and use of supporting details. 

The first level we analyzed is the overall organizational structure of each text 
selection, which is linked to the author's purpose for writing — referred to here as 
rhetorical mode. 17 Examples of rhetorical mode include exposition and persuasion. 
The second level of analysis was focused on the identification of dominant text 
feature(s), which are linked to the presentation of main ideas. Dominant features 
occur throughout a text, usually in support of the main idea (e.g., explaining the 
process of osmosis, which would be coded as explanation, or describing the cultural 
traits of an ancient culture, which would be coded as description). Examples of other 
dominant text features include classification and sequencing. The third level of 
analysis consisted of the identification of supporting features, which are typically 
used to provide key details and to establish relationships between main ideas in a 
selection. Supporting features are often embedded in prose that has a different 
dominant feature (e.g., in explaining the process of osmosis, the author may use 
examples, which would be coded as exemplification; while describing the cultural 
traits of an ancient culture, new terms may be defined, which would be coded as 
definition). Other examples of supporting features include labeling and paraphrase. 



17 For the purpose of this research, we are using the term rhetorical mode, but we acknowledge that 
there are a variety of terms used across the disciplines, including discourse mode, macro-structure, 
and genre-schema (Richards, Platt, & Platt, 1992), that all refer to essentially the same concept. 
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For mathematics, we examined only the dominant and supporting text 
feature(s), since the selections consist of word problems, which do not have a 
rhetorical mode per se — a feature of extended discourse. We examined rhetorical 
mode and the dominant and supporting text features in the science and social 
studies selections. 

Also, when applicable, we analyzed tasks in the mathematics word problems 
that require language output in addition to computation. Each word problem 
concludes with a problem statement or question, which we refer to in this report as a 
task. Some tasks require computation, while others require language production 
(e.g., explaining an answer or generating a problem question). 

In addition to describing the rhetorical mode and dominant and supporting 
features in the selections, we characterized the overall frequency of their occurrence. 
To determine the frequency of occurrence of each rhetorical mode and dominant 
feature identified, we counted the number of selections in which a given rhetorical 
mode and/or dominant feature occurred. For supporting features, in addition to 
counting the number of selections in which a feature occurred, we noted if the 
features occurred just once in a selection or if there were multiple occurrences. 
Knowing whether or not there are multiple occurrences in each selection indicates 
how prevalent the features are within each text. Flaving both pieces of information 
(i.e., the frequency in which the features occur across as well as within texts) 
provides a stronger gauge of the relative frequency of each feature in the subject 
area. 

Last, we identified the types of contexts in which the supporting features 
occurred (e.g., within a single sentence, across multiple sentences, or at the 
paragraph level). Knowing the linguistic contexts in which these features typically 
occur is important because it provides information needed for the development of 
item and task specifications (e.g., students should be tested on their ability to use 
and recognize linguistic features in text settings typical of each feature). We did not 
perform the same analysis for rhetorical mode and dominant features since by 
definition they occur across multiple linguistic contexts. 

In the sections below, we first describe the procedures for the analyses. We 
then present the results and end the chapter with a discussion of the findings across 
the three subject areas. 
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Procedures 



After we developed and refined the approach to analyzing the text selections, 
we established an initial glossary of organizational features with definitions and 
examples based on the definitions in Butler et al. (2004), which was then used as a 
coding schema. The glossary was expanded and refined through several rounds of 
rating sample texts from each subject area (see Appendix K for a complete glossary 
of features identified in the research). Later, new features were added as they were 
identified during the actual analyses. 

In preparation for performing the analyses, we first divided the 12 selections 
for each subject area into 4 sets of 3 for the purpose of conducting reliability checks 
at regular intervals (see Chapter 1 for a description of the text selection process and 
the types of texts selected). Working independently, we read and identified the 
rhetorical mode in the science and social studies selections. Then we read the 
selections from all three subject areas and identified the text features in each 
selection, noting them in the margins next to each occurrence. We recorded and 
classified each feature as either a dominant feature or a supporting feature on a 
separate rating sheet. For supporting features, we indicated on the rating sheet 
whether the features occurred once or multiple times in a given selection and noted 
the types of contexts in which they occurred (e.g., at the sentence and/ or paragraph 
level). A final reading of each selection served as a check to assure the completeness 
of the identifications and the accuracy of the classification of features into the 
dominant and supporting feature categories. In addition, for mathematics, features 
of the tasks that involve language production were specified when applicable. 

Interrater reliability was calculated for the identification of rhetorical mode and 
text features, as well as for the classification of each feature as dominant or 
supporting. Two researchers rated five selections each from mathematics, science, 
and social studies, for a total of 15 out of the 36 selections; one researcher rated the 
other 21 selections. The reliability coefficients are provided in Table 29. 

The average reliability for the identification of rhetorical mode and dominant 
and supporting text features in the 15 text selections rated by both researchers was 
.92 overall across subjects, with a range of .75 to 1.00 across individual text 
selections; reliability was .93 overall for the classification of features into the 
dominant and supporting categories across subjects, with a range of .88 to 1.00 
across individual text selections. 
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Table 29 



Reliability Coefficients by Subject 



Subject 


Identification of 
Rhetorical Mode & 
Text Features 


Classification of 
Dominant & 
Supporting Features 


Mathematics (5) 


.95 


1.00 


Science (5) 


1.00 


.95 


Social Studies (5) 


.88 


.90 


Overall reliability (15) 


.92 


.93 



Note. Numbers in parentheses indicate the number of text selections rated for 
a given subject. 



Findings 

In this section, we first discuss our findings in terms of the features identified 
in the selections and their overall frequency across selections by subject. We then 
discuss the frequency of the supporting features within the selections as well as the 
linguistic contexts in which these features occur. 

Mathematics. As mentioned earlier, analysis of the rhetorical mode was not 
applicable to the mathematics text selections. A discussion of the dominant and 
supporting features and the features of the mathematics tasks is included here. Table 
30 provides a list of the dominant and supporting features identified in the text 
selections and the number of selections in which each feature occurred. 

Two types of dominant features were identified: description and scenario. In the 
selections analyzed in this study, the word problems are usually composed of 
scenarios in which a real-life situation is described (e.g., the word problem set up), 
followed by a task. Students use the descriptive information provided in the 
scenario to perform calculations needed to solve the problem. All 12 mathematics 
text selections contain word problems with scenarios, while six selections also 
contain word problems that consist of descriptions of a mathematical nature (e.g., A 
fraction has a numerator of 9 and a denominator that is a prime number greater than 5. Why 
is such a fraction in its simplest form?). Examples such as these were coded as 
descriptions rather than scenarios. 
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Table 30 



Organizational Features in Fifth-Grade Mathematics Textbook 
Selections 



Features 


Dominant 

Feature 3 


Supporting 

Feature 


Comparison 




10 


Description 


6 


0 


Enumeration 




11 


Labeling 




1 


Paraphrase 




3 


Provide instruction or guidance 




3 


Scenario 


12 


0 


Sequencing 




9 



a The numbers represent the total number of selections in which a 
feature occurred as either a dominant or supporting feature. 



We identified a total of six supporting features in the mathematics selections: 
comparison, enumeration, labeling, paraphrase, providing instruction or guidance, and 
sequencing, of which comparison and enumeration were the most frequently identified. 
The textbooks sometimes assist students by providing instruction or guidance in the 
form of anecdotal notes and parenthetical references (e.g., noting in parentheses at 
the end of a word problem that students should use 365 days when performing a 
calculation involving years). 

Table 31 provides a list of the features identified in the word problem tasks that 
call for language production and the number of selections in which they were 
identified across topics. 

We identified six features in the mathematics tasks that require language 
production: comparison, description, explanation, hypothesis, justification, and writing a 
problem or question. Explanation appears most frequently in the problem statement or 
question of the word problem (e.g.. Does she have enough material to build a tree 
house? Explain.). The features are spread evenly across the selections and textbooks, 
with the exception of instructions for students to write a problem or question. That 
feature occurs only in selections from one of the three textbooks used in the 
analyses. 
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Table 31 



Productive Language Features Identified 
in Mathematics Tasks Across Topics 



Features 


Total a 


Comparison 


4 


Description 


3 


Explanation 


9 


Hypothesis 


1 


Justification 


2 


Write problem or question 


4 



a Total indicates the number of selections in 
which the features occurred. 



Science. Not unexpectedly, the rhetorical mode identified in all 12 science 
selections is exposition. Indeed, the primary goal of the science texts is to provide 
information about scientific phenomena by explaining the "how" and the "why" of 
those phenomena. Table 32 provides a list of all the dominant and supporting text 
features identified in the science selections and the number of selections in which 
each feature occurred. 

Four dominant text features were identified for science: classification , description, 
explanation, and sequencing. Description is present in all 12 passages, while explanation 
occurs in five. Classification and sequencing may be topic specific, since they only 
occur in two selections each. 

We identified a total of 17 supporting text features. Of the 17, enumeration and 
labeling occur most frequently, appearing in all 12 passages, followed by comparison, 
definition, and references to other text or visual support, which each occur in 10 of the 
science selections. The textbooks provide references to other texts or visuals, such as 
reference to an activity, experiment, or a specific page in the textbook, to help make 
concepts more concrete for students and/or to scaffold material that students have 
already covered in previous chapters. 18 Analogy and simile, while typically 
considered stylistic devices in language arts, seem to fulfill a different purpose in the 
science selections by providing an authentic context to help explain a concept or to 
make a comparison (e.g., "Like a spinning skater who pulls her arms in close to her sides, 
the spinning tornado gets faster and faster”) (Moyer et al., 2000, p. 164). 



18 May on occasion refer reader to a visual outside the selected text. 
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Table 32 



Organizational Features in Fifth-Grade Science Textbook Selections 



Features 


Dominant 

Feature 3 


Supporting 

Feature 


Analogy 




2 


Classification 


2 


2 


Comparison 




10 


Description 


12 


0 


Definition 




10 


Enumeration 




12 


Exemplification 




9 


Explanation 


5 


6 


Labeling 




12 


Paraphrase 




7 


Provide instruction or guidance 




3 


Questions 




8 


Reference to text or visual 




10 


Scenario 




2 


Sequencing 


2 


5 


Simile 




2 


Summary 




2 



a Number of selections in which a feature occurs as either a 
dominant or supporting feature. 



The sample text below is excerpted from the science selection in Appendix A, 
Measuring Mass and Volume, and is provided here to illustrate the dominant feature 
description and the supporting features definition, paraphrase, comparison, and labeling. 

The volume of an object is the amount of space it takes up [definition]. For example, 
an inflated balloon takes up more space — has greater volume — than an empty balloon. 
Volume can also be used to express capacity — that is, how much material something can 
hold [definition via paraphrase]. A swimming pool can hold a lot more water than a 
teacup can. 

The basic unit of volume in the metric system is the cubic meter (m3). But because 1 
m3 is such a large amount, the liter (L) is more commonly used. A liter is slightly larger 
than a quart [comparison]. Many soft drinks are sold in 2-L containers. Units used to 
measure smaller volumes include the centiliter (cL), which is one hundredth of a liter, 
and the milliliter (mL), which is one thousandth of a liter. 

A graduated cylinder, which is often called a graduate [labeling], is used to measure 
liquid volumes. Using a graduate is similar to using a measuring cup [comparison] 
(Badders et al., 2000, p. Cll). 
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In this excerpt, the notion of volume is presented through a series of 
descriptions that evoke visual images of everyday objects such as a balloon, a 
swimming pool, and soft drink containers. The supporting features interact as part 
of the dominant descriptive effort to explicate the concept of volume. 

Social Studies. To varying degrees, a type of storytelling or narrative occurred 
in all 12 selections in which events or descriptions are often related to the reader in 
the third person (e.g., he was frustrated by the changes other people made in his original 
draft of the Declaration of Independence). However, this narrative voice is employed for 
the purpose of presenting information about historical people, places, dates, and 
events instead of for the purpose of storytelling. Thus, while the primary goal of the 
textbooks is expository in nature, the use of narrative as a means for presenting 
information is so central to the social studies selections that the two rhetorical modes 
combine to form a special classification we are referring to here as exposition through 
the use of narration. 

A list of the dominant and supporting text features identified in the social 
studies selections is provided in Table 33. 



Table 33 



Organizational Features in Fifth-Grade Social Studies 
Textbook Selections 



Features 


Dominant 

Feature 3 


Supporting 

Feature 


Classification 




1 


Comparison 




6 


Contradiction 




2 


Description 


12 


0 


Definition 




9 


Enumeration 




12 


Exemplification 




10 


Explanation 


4 


7 


Labeling 




12 


Paraphrase 




8 


Questions 




5 


Quotation 




11 


Reference to text or visual 




7 


Sequencing 


3 


7 



a Number of selections in which a feature occurs as a 
dominant or supporting feature. 
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We identified three dominant text features in the selections analyzed: 
description, explanation, and sequencing. Description is present in all 12 text selections, 
while explanation plays a dominant role in four selections and sequencing only occurs 
in three. Explanation may be topic specific, since it only occurs in selections about the 
Declaration of Independence and the Industrial Revolution, as does sequencing, which 
occurs in two selections about the Declaration of Independence and one about Pilgrims. 
This may not be surprising since learning about the Declaration of Independence 
and the Industrial Revolution requires analysis of the wording and purposes of the 
Declaration of Independence and understanding how and why different inventions 
changed the way people worked during the Industrial Revolution. It is also natural 
that sequencing would play a role in the topics Declaration of Independence and 
Pilgrims since both require an understanding of the sequence of events that led to or 
resulted in other events. Slavery on the other hand requires more description because 
the fifth-grade curriculum focuses on early American history, which involves 
learning about how African Americans lived and functioned in early American 
society, not the events leading to the Civil War. 

Fourteen supporting features were identified. Enumeration and labeling occur 
most frequently (in all 12 selections). Quotation also appears frequently, occurring in 
11 selections, as does exemplification, which appears in 10. Quotations are drawn 
from primary sources and have multiple purposes including: to exemplify an idea, 
to add emphasis, or to add descriptive detail (e.g., " 'Saturday nights we'd slip out of 
the quarters and go to the woods,’ Jones recalled." [Armento et al., 1999, p. 407]). 

The sample text below is excerpted from the social studies selection in 
Appendix A, The Industrial Revolution. It illustrates the dominant feature description 
and the supporting features of labeling, comparison, cause and effect, explanation, and 
definition. 

At the time of Britain's Industrial Revolution, the young United States was still 
mainly a land of farms. Before long, though, a British mechanic named Samuel Slater 
[labeling] would bring the Industrial Revolution to the United States. His yarn-spinning 
machine would come to represent the beginning of a new way of life for our country. 

Because of the Industrial Revolution, no other country in the world could make 
cloth as cheaply as Great Britain [comparison]. The British wanted to keep their 
profitable technology a secret. So they passed laws making it illegal to export machines 
or machine plans [cause and effect]. The people who operated machines in cotton 
factories were not even allowed to leave the country. 
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In 1789 Samuel Slater memorized the plans of the British spinning machines. He had 
heard that, because of the free market in the United States, business owners there would 
pay for this new technology [explanation]. In a free market, producers of goods and 
services freely decide how to use resources in response to demand [definition]. People in 
the United States wanted to start their own business in making cloth (Banks et al., 2001, 
pp. 404-405). 

In this excerpt, the text describes the beginning of the industrial revolution in 
the United States through the efforts of Samuel Slater. The dominant feature is 
realized through the interaction of the supporting features such as comparison (e.g., 
contrasting Britain's technological expertise with the world's) and definition (e.g., 
defining what a free market is). 

Frequency of Occurrence of Supporting Features. An important part of 
characterizing the organizational features of texts for each subject area includes 
determining the level of frequency with which features occur. This informs test 
development by indicating which features are most critical to students when 
reading subject-matter textbooks. The frequency of rhetorical modes and dominant 
features was noted in the analyses above as either present or not present in a 
selection. In this part of the analysis, we determined the frequency of supporting 
features not only by identifying their presence, but also by noting if they occur in a 
selection just once or multiple times. Table 34 provides the frequency of occurrence 
of supporting features within and across subjects. 

In the mathematics tasks that require language output, only two of the six task 
features identified occur multiple times ( description and explanation). The other four 
features all occur just once per selection (see Table 35 for the frequency of features in 
mathematics tasks that require language output). Since each mathematics selection is 
a compilation of word problems, the results indicate that tasks requiring language 
production occur infrequently overall (i.e., they occur in 24 out of the 212 word 
problems analyzed, 11% of the total). 
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Table 34 



Frequency of Occurrence of Supporting Features Across Subjects 







Mathematics 






Science 






Social Studies 




Features 


Single 

Occurrence 


Multiple 

Occurrences 


Math 

Total 3 


Single 

Occurrence 


Multiple 

Occurrences 


Science 

Total 


Single 

Occurrence 


Multiple 

Occurrences 


Social 

Studies 

Total 


Analogy 








2 




2 








Classification 








1 


1 


2 




1 


1 


Comparison 


1 


9 


10 


5 


5 


10 


1 


5 


6 


Contradiction 














2 




2 


Definition 








1 


9 


10 


5 


4 


9 


Enumeration 




11 


11 


4 


8 


12 


2 


10 


12 


Exemplification 








2 


7 


9 


7 


3 


10 


Explanation 










6 


6 




7 


7 


Labeling 




1 


1 


3 


9 


12 




12 


12 


Paraphrase 


3 




3 


2 


5 


7 


2 


6 


8 


Provide 
instruction or 


1 


2 


3 


3 




3 








guidance 

Questions 








5 


3 


8 


4 


1 


5 


Quotation 














4 


7 


11 


Reference to text 








9 


8 


10 


A 


a 


7 


or visual 








z. 




D 




Scenario 








2 




2 








Sequencing 




9 


9 


2 


3 


5 


1 


6 


7 


Simile 








2 




2 








Summary 








2 




2 









a Total number of selections in which features occur (12 selections maximum for each subject). 
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Table 35 



Frequency of Productive Language Features in Mathematics Tasks 



Features 


Single 

Occurrence 


Multiple 

Occurrence 


Total 


Comparison 


4 




4 


Description 


2 


1 


3 


Explanation 


5 


4 


9 


Hypothesis 


1 




1 


Justification 


2 




2 


Write problem or question 


4 




4 



Contexts in Which Supporting Features Occur. As mentioned above, we also 
examined the types of textual contexts in which the supporting features occur. Four 
contexts were identified in the selections analyzed: sentence (occurring in a single 
sentence), multi-sentence (occurring in two or more adjacent sentences), paragraph 
(predominant in an entire paragraph), and multi-paragraph (occurring in two or 
more adjacent paragraphs). Table 36 provides data on the contexts in which each 
supporting feature was identified across subjects. 

A list of all supporting text features identified is provided in the far left column 
with the contexts given under each subject area header. The numbers represent the 
number of text selections in which a feature was identified in a given context. Thus 
the numbers are not absolute numbers; that is, they do not represent the total 
number of times a comparison, for example, was made across all sentences. Rather, in 
mathematics, comparison was identified at the sentence level in ten selections and at 
the multi-sentence level in four. Although these counts are aggregated at the level of 
the selection and do not represent discrete occurrences of a feature, they do provide 
a sense of the types of contexts in which different features occur and the relative 
dispersion across the selections within a subject area. This information will be 
applied in our test development efforts, as frequently occurring features should be 
assessed in the types of linguistic environments in which they typically occur. 
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Table 36 



Contexts in Which Supporting Features Occur Across Subjects 



Features 


Mathematics 




Science 






Social Studies 




„ , Multi- 

Sentence* Sentence 


Sentence 


Multi- 

Sentence 


Paragraph 


Multi- 

Paragraph 


Sentence 


Multi- 

Sentence 


Paragraph 


Multi- 

Paragraph 


Analogy 




1 




1 












Classification 






1 




1 




1 


1 




Comparison 


10 4 


7 


1 


4 




6 


1 


3 




Contradiction 












2 








Definition 




10 








9 








Enumeration 


11 7 


11 


2 


2 




12 


3 






Exemplification 




8 


3 


4 


1 


9 


2 






Explanation 




6 




5 




6 


4 


3 




Labeling 


1 


11 


4 


1 




12 








Paraphrase 


3 


7 








8 








Provide instruction/ or 


3 


3 
















guidance 




















Questions 




8 








5 








Quotation 












11 


1 


1 




Reference to text/ 




10 








7 








or visual 




















Scenario 






1 


1 












Sequencing 


5 8 


1 


2 


3 


1 




2 


3 


6 


Simile 




1 




1 












Summary 








2 
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a Total number of selections in which a feature was identified in the particular context for a maximum of 12 in any cell. 
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The data in Table 36 show that overall supporting features occur most 
frequently at the sentence level across subjects, although in science, supporting 
features occur frequently at the paragraph level as well. Enumeration , labeling , and 
definition, in that order, are the most frequently occurring features at the sentence 
level. However, comparison, exemplification, explanation, and sequencing occur in a 
greater variety of contexts, depending on the subject area. These findings indicate 
that supporting features are usually embedded within the dominant features, a 
finding parallel to that in Butler et al. (2004). We will return to this point in the 
summary and in Chapter 6. 



Chapter Summary 

Contrasts occur across the three subjects at each level of analysis. The 
difference in the writer's purpose from one subject area to another is clearly evident; 
thus the rhetorical modes vary. The mathematics selections differ from the science 
and social studies selections in that the primary purpose of word problems is to 
provide a context for problem-solving practice. The science selections analyzed 
differ from the social studies selections in that they follow a more traditional 
expository form in which information is presented, explained, and then sometimes 
summarized in a fairly straightforward format. The social studies selections, on the 
other hand, use a narrative form to present information. That is, historical 
information often reads like a story, unfolding chronologically with details provided 
through the eyes of historical figures. 

A comparison of the number of different types of features occurring across 
subjects is shown in Table 37. 



Table 37 



Number of Dominant and Supporting Features 
Identified per Subject Area 



Subject 


Dominant 


Supporting 


Feature 


Feature 


Mathematics 


2 


6 


Science 


4 


17 


Social Studies 


3 


14 
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Overall, the range of features, especially supporting features, identified in the 
science and social studies text selections is broader than in mathematics. This 
difference reflects the longer and more varied nature of the science and social 
studies texts, which tend to be more dense than a typical word problem due to 
inclusion of new concepts and explanation or description of processes and events. 

Table 38 provides the comparative data for the specific dominant and 
supporting features across the three subject areas. There are similarities in the 
specific dominant features identified across the three subject areas. Description 
occurs in 30 of the 36 text selections across subjects. Explanation and sequencing occur 
in both science and social studies selections. However, as dominant features, scenario 
only occurs in mathematics and classification only occurs in science. 

Several supporting text features occur frequently across the three subjects, 
specifically enumeration, comparison, and sequencing; whereas labeling, definition, 
exemplification, paraphrase, and references to supporting text or visuals occur almost 
exclusively in science and social studies selections. 

Paraphrase appears in all three subjects, with different characteristics in each. In 
our coding of mathematics, we used paraphrase to denote the restatement of 
numbers in decimal form (e.g.. Each piece of fruit is 3/100, or 0.03, sugar). Similarly, in 
science, we used paraphrase to denote restatement of different units of measurement 
(e.g.. The zvind speeds can reach up to 300 km/hr [about 186 mi/hr]). In social studies, 
paraphrase appears in the more traditional form in which a word or phrase is 
restated or a synonym is used in order to define a new word or to provide 
clarification to the reader (e.g.. Before leaving, they wrote a compact, or agreement). 

One trend that emerged primarily in the science and social studies selections is 
the frequent use of "instructional devices" to help students access the subject matter 
in the selections. These devices may be a unique feature of academic prose, used 
possibly because of the more complex nature of academic discourse compared to 
oral language and also possibly due to the conceptual difficulty of the content. The 
instructional devices identified in the present research have some parallels to 
observations made of the oral language used by teachers in classroom settings 
(Bailey et al., 2002). For example, in science and social studies there are references to 
other parts of the textbook, visuals in almost every selection analyzed, and 
sometimes references to prior experiments and activities, all used as a means of 
scaffolding material or reminding students of resources available to them. Questions 
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are used in the science and social studies selections to stimulate critical thinking or 
to preview new information. In mathematics and science, we identified a type of 
instructional device in which the author provides instruction or guidance to help 
students solve a problem or understand new information, for example, by providing 
students equivalent forms of measurement that may be used to solve a problem. 



Table 38 



Organizational Features in Fifth-Grade Textbook Selections Across Subjects 





Mathematics 


Science 


Social Studies 


Features 


Dominant Supporting 


Dominant 


Supporting 


Dominant 


Supporting 


Feature 3 Feature 


Feature 


Feature 


Feature 


Feature 


Analogy 






2 






Classification 




2 


2 




1 


Comparison 


10 




10 




6 


Contradiction 










2 


Description 

Definition 


6 


12 


10 


12 


9 


Enumeration 


11 




12 




12 


Exemplification 






9 




10 


Explanation 




5 


6 


4 


7 


Labeling 


1 




12 




12 


Paraphrase 

Provide 


3 




7 




8 


instruction or 
guidance 


3 




3 






Questions 






8 




5 


Quotation 










11 


Reference to text 
or visual 






10 




7 


Scenario 


12 




2 






Sequencing 


9 


2 


5 


3 


7 


Simile 






2 






Summary 






2 







a Number of selections in which a feature occurs as a dominant or supporting feature (12 selections 
maximum for each subject). 



As pointed out earlier, we identified the use of what are typically considered 
writing devices ( analogy and simile ) in science, which are used like the "instructional 
device" feature discussed above. In science, they are intended to help the reader, not 
to add stylistic flair. In addition, scenario is occasionally found in science selections. 
Much like the scenarios used in mathematics, short descriptions of real-life 
situations are embedded in science selections to exemplify a concept or idea and 
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make it more vivid to the reader. The following made-up example is typical of the 
types of scenarios in science textbooks: If you moved to a nezv neighborhood, how zvoidd 
you make nezv friends? You might go introduce yourself to your nezv neighbors and invite 
them over for a barbecue or for tea. This example also includes a question, which is 
typical of many science selections. 

Finally, it should be noted again that supporting features are typically 
embedded in dominant features, such as description or explanation, in the service of 
expanding or adding detail to a text. Sometimes supporting features also occur in 
paragraph or multi-paragraph contexts, providing textbook authors a means of 
exemplifying concepts on a broader scale, labeling or defining new terms in greater 
detail, or paraphrasing information to ensure that ideas are clear to students. 

Taken together, the results of these analyses and the analyses presented in 
earlier chapters are critical to the creation of test specifications. The data provide a 
picture of how language is used to accomplish different goals in student textbooks 
and how textbooks attempt to provide support for students as they are reading, as 
teachers do orally while conducting lesson. We turn now to a synthesis of all the 
data into subject area text profiles. 
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CHAPTER 6: 



SYNTHESIZING AND UTILIZING EMPIRICAL DATA FOR TEST 
DEVELOPMENT PURPOSES 

In Chapters 2-5 we provided empirical data that help answer the two research 
questions presented in Chapter 1. In this chapter we synthesize the results from 
those chapters to answer the second research question more explicitly. The second 
research question asks: Hozv do texts in different subject areas compare to one another in 
terms of identified characteristics? Answering this question will be important because 
commonalities identified are candidates for general academic language proficiency 
assessment tasks and items, whereas differences are candidates for developing 
subject-specific language tasks and items. 

First, we provide descriptions of the linguistic characteristics of each subject 
area based on the current research. Next, we compare the linguistic features of each 
subject area, noting the commonalities and differences across subjects. Then we 
synthesize the results of the current research with those from prior CRESST research 
and other studies and apply the findings to the test development process in order to 
show how data synthesized in this chapter can be used to develop language 
assessment instruments. We turn now to a description of the linguistic 
characteristics of each subject area. 

Table 39 provides a cross-subject-area profile that lists the main linguistic 
features investigated in this study down the left hand side and the values and 
ranges for each feature that typify the subject area text selections on the right hand 
side. Each section of the table contains data drawn from the different analyses 
performed in the current study (e.g., descriptive features, grammatical features, and 
so on). 
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Table 39 



Linguistic Profiles of Fifth-Grade Mathematics, Science, and Social Studies Text Selections 3 





Math 


Science 


Social Studies 


Mean no. of sentences per word problem 
or paragraph (range) 


3 (2-7) 


4(1-8) 


4 (1-9) 


Mean no. of words per sentence (range) 


11 (1-39) 


13 (1-37) 


14 (3-43) 


Lexical diversity ratio 


.43 


.41 


.49 


Percentage of all categories of academic 
vocabulary words* 3 


10% (14%) 


21% (27%) 


24% (24%) 


General academic words only 


3% (5%) 


6% (11%) 


3% (7%) 


Specialized academic words only 


4% (7%) 


14% (14%) 


9% (11%) 


Measurement words only 


3% (2%) 


1% (1%) 


<1% 


Proper nouns only (specialized) 


<1% 


<1% 


7% (5%) 


Colloquialisms only 


<1% 


<1% 


<1% 


Vocabulary features 








Low-frequency words 


8% (12%) 


8% (12%) 


8% (12%) 


3-or-more-syllable words 


6% (9%) 


10% (15%) 


12% (16%) 


Derived words 


2% (4%) 


6% (11%) 


8% (12%) 


No. of unique clause connectors in each 
subject area 


11 


7 


21 


Avg. percentage of nominalizations per 
selection 


<1% 


2% (3%) 


2% (3%) 


Avg. percentage of each sentence type 
per selection 








Simple sentences 


81% 


61% 


63% 


Complex sentences 


17% 


36% 


33% 


Other sentence types 


2% 


3% 


4% 


Avg. percentage of dependent clauses per 
selection 


6% 


29% 


28% 


Mean no. of passive voice verb forms per 
sentence 


.04 


.24 


.16 


Mean no. of prepositional phrases per 
sentence 


1 


1 


1 


Mean no. of words per prepositional 
phrase (range) 


4 (2-14) 


4 (2-17) 


4 (2-20) 


Mean no. of noun phrases per sentence 


.03 


.16 


.17 


Mean no. of words per noun phrase 


2 (1-16) 


3 (1-23) 


3 (1-19) 


(range) 

Mean no. of participial modifiers per 


.03 


.17 


.17 



sentence 

(table continues) 
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Table 39 ( continued ) 

Linguistic Profiles of Fifth-Grade Mathematics, Science, and Social Studies Text Selections 3 





Math 


Science 


Social Studies 


Dominant organizational features 


Classification 


0% 


17% 


0% 


Description 


50% 


100% 


100% 


Explanation 


0% 


42% 


33% 


Scenario 


100% 


0% 


0% 


Sequencing 


0% 


17% 


25% 


Supporting organizational features 1 - 


Comparison 


67% 


83% 


50% 


Definition 


0% 


83% 


75% 


Enumeration 


92% 


100% 


100% 


Exemplification 


0% 


75% 


83% 


Labeling 


0% 


100% 


100% 


Paraphrase 


17% 


58% 


67% 


Provide instruction or guidance 


25% 


25% 


0% 


Quotation 


0% 


0% 


92% 


Reference to text or visual 


0% 


83% 


58% 


Sequencing 


75% 


42% 


58% 



a Numbers in this table have been rounded to the nearest whole number for percentages and the 
nearest one hundredth for decimals. ^Percentages shown are token (type). c The five most frequently 
occurring supporting features in each subject area are listed here, although there is some overlap, 
resulting in a total number of 10 supporting features in the list. The percentages represent the 
percentage of selections in which a particular feature was identified. 

Linguistic Characteristics of Mathematics Textbook Selections 

The descriptive, lexical, grammatical, and discourse features of the 
mathematics textbook selections are discussed in turn below. A summary then 
follows that characterizes the general nature of the mathematics selections. 

Descriptive Features 

In mathematics selections, the mean number of sentences per word problem is 
3, with a range of 2-7 sentences per word problem. Mean sentence length is 11, with 
a range of 1-39 words per sentence. 
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Lexical Features 



The average lexical diversity ratio for mathematics text selections is .43. That is, 
slightly over half of the words in mathematics appear more than once in a selection 
(i.e., in a set of word problems). In our analyses of academic vocabulary in 
mathematics, we found that on average 10% of all word tokens and 14% of all word 
types in the 12 text selections were identified as academic vocabulary. Typically, 
there are slightly more specialized academic and measurement words than general 
academic words. On average, specialized academic words account for 4% of total 
word tokens and 7% of total word types, whereas measurement and general 
academic words account for about 3% each of word tokens and 2% to 5% 
respectively of total word types on average. Few colloquialisms or proper nouns 
were identified in mathematics selections. 19 

Low-frequency vocabulary accounts for 8% of total word tokens and 12% of 
total word types on average per selection; 3-or-more-syllable words account for 6% 
of total word tokens and 9% of word types on average; and derived words account 
for 2% of total word tokens and 4% of total word types. A total of 11 different clause 
connectors were identified in the text selections. With a total frequency of 80 
occurrences across all the selections (212 mathematics word problems in total), there 
is approximately one connector in every three word problems. Of these clause 
connectors, there are 7 types identified as the more challenging adverbial dependent 
clause connectors, with an average of 9 adverbial clause connectors per 100 
sentences and 8 per 1000 words. The most frequent adverbial connector is if; other 
frequently used connectors include and, but, and when. Nominalizations account for 
less than 1% of total word types and tokens on average. 

Grammatical Features 

In mathematics selections, the majority of sentences are simple sentences (81%), 
followed by complex sentences (17%). Percentages of compound and 
compound /complex sentences are small. Approximately 27% of all clauses 
identified in mathematics selections are dependent clauses. 



19 Names of people are frequently used in mathematics, but they are not classified as academic 
vocabulary because they are inconsequential to the content being taught. In social studies, however, 
students learn the names of people, places, and things, which are considered academic vocabulary 
because they are consequential to the lessons. 
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There are few passive voice verb forms (4 per 100 sentences, an equivalent to 
approximately one passive verb form for every eight word problems) and no 
passive constructions using "by" in the mathematics selections analyzed. Each 
mathematics sentence contains an average of 1 prepositional phrase, with a mean of 
4 words per prepositional phrase and a range of 2-14 words in length. There are 
slightly more noun phrases on average per sentence (3), but the length of noun 
phrases is shorter than prepositional phrases (approximately 2 words each). Noun 
phrases range in length from 1-16 words. Mathematics selections contain few 
participial modifiers of any kind at the fifth-grade level, with approximately one in 
every eleven word problems. 

Discourse Features 

Across subjects, we analyzed the organizational features of discourse in each 
subject area, including rhetorical mode, dominant organizational text features, and 
supporting text features. As explained in Chapter 5, since rhetorical mode is a 
feature of extended discourse and not mathematics word problems, we did not 
perform this analysis on the mathematics selections. 

Description and scenario were identified as the dominant organizational features 
in mathematics selections. All 12 selections contain word problems with scenarios, 
which consist of descriptions of real-life situations followed by a task; 6 selections 
also contain descriptions, in which the word problem set up contains a description of 
a numerical problem but no real-life scenario. 

A total of 6 supporting organizational features were identified in mathematics 
selections, among which enumeration and sequencing occur most frequently. 
Additionally, we analyzed mathematics tasks that require language production. Six 
task features were identified, among which comparison and explanation are the most 
frequent. 

Summary 

Overall, mathematics word problems at the fifth-grade level contain little 
academic vocabulary, are mostly composed of simple sentences, and do not appear 
to be grammatically complex according to the analyses presented above. The 
dominant organizational features of word problems are description and scenario, with 
several frequently occurring supporting features used to provide the detail 
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necessary for students to set up the mathematical computations needed to solve the 
word problem, such as enumeration and sequencing. The word problem below is an 
example of word problems typical of those analyzed here (see Appendix A for other 
sample word problems). 

A large can of juice contains 1.5 liters and sells for $2.09. A smaller can of the same juice 
contains 750 milliliters and sells for $0.98. Which is the better buy? (Remember: There are 
1000 milliliters in 1 liter) (Willoughby et al., 2003, p. 268). 

This word problem is typical in that it is of average length (i.e., 3 sentences), 
and the dominant organizational feature is scenario (e.g., it describes a context for 
problem solving in which students are required to compare the price and size of two 
different cans of juice). It has 3 supporting organizational features: comparison (e.g., 
the comparative adjectives smaller and better), enumeration (the size and price of two 
juice cans is enumerated across two sentences), and providing instruction and/or 
guidance (e.g., measurement equivalents are provided). The sample includes 
measurement words (e.g., liters ) and also a mathematics-related colloquialism (e.g., 
better buy). 



Linguistic Characteristics of Science Textbook Selections 

The descriptive, lexical, grammatical, and discourse features of the science 
textbook selections are discussed in turn below. A summary then follows that 
characterizes the general nature of the science selections. 

Descriptive Features 

In science, the mean number of sentences per paragraph is 4, with a range of 1- 
8 sentences per paragraph. Mean sentence length is 13 words, with a range of 1-37 
words per sentence. 

Lexical Features 

The average lexical diversity ratio for science text selections is .41, indicating 
that as with mathematics, slightly over half the words in science are repeated in a 
typical selection. 

Academic vocabulary analyses revealed that on average approximately 21% of 
all word tokens and 27% of word types in science selections are considered 
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academic vocabulary. There are typically more specialized academic words in 
science selections than other categories of academic vocabulary. Specialized words 
account for about 14% of total word types and tokens. General academic vocabulary 
makes up about 11% of total word types and 6% of total word tokens on average. 
Measurement words, colloquialisms, and proper nouns were also identified in 
science selections, but they each account for less than 1% of total words on average. 

Low-frequency vocabulary and 3-or-more-syllable words account for 
approximately 8% and 10% of all word tokens respectively and 12% and 15% of all 
word types on average. Derived words make up about 6% of all word tokens and 
11% of all word types. Eighteen different clause connectors were identified in the 
science selections. With a total of 107 occurrences across selections, there are an 
average of 9 connectors per selection. Of the 17 different types of connectors 
identified, 15 are adverbial dependent clause connectors, with an average of 18 
adverbial clause connectors per 100 sentences and 14 per 1000 words. The most 
frequent connectors are when and as ; other frequent connectors include because, if, 
and and. Nominalizations account for 2% of total word tokens and 3% of total word 
types on average. 

Grammatical Features 

In science selections the majority of sentences are simple sentences (61%), 
followed by complex sentences (36%). Percentages of compound and 
compound/complex sentences are small (about 3% combined). Approximately 29% 
of all the clauses identified in science selections are dependent clauses. 

There are on average .24 passive voice verb forms per sentence (an equivalent 
of one passive verb form every fourth sentence in a typical science selection) and .03 
passive constructions with "by" per sentence. Prepositional phrases appear 
frequently, with approximately 1 prepositional phrase per sentence, a range in 
length of 2-17 words, and a mean of 4 words. There are slightly more noun phrases 
per sentence (3). The range in length of noun phrases is longer as well (1-23 words), 
with a mean of 3 words per phrase. Typically, there are .17 participial modifiers per 
sentence, or one every fifth sentence in science text selections. 
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Discourse Features 

The rhetorical mode used in all 12 science selections is exposition. We identified 
four dominant organizational features: classification, description, explanation, and 
sequencing. Among the four features, description is found in all the selections, 
whereas explanation occurs in five. Classification and sequencing, on the other hand, 
may be topic specific since they only occur in selections for two of the topics. 

A total of 17 supporting features were identified, of which enumeration and 
labeling occur most frequently and can be found in all 12 selections. Other frequently 
used features include comparison, definition, and references to other text or visual 
support, which occur in 10 science selections each. 

Summary 

Overall, science selections at the fifth-grade level contain a variety of general 
and specialized academic vocabulary and are mostly composed of simple sentences. 
The sentence structures do not appear to be grammatically complex, although the 
sentences tend to be longer and more varied than in mathematics (e.g., they contain 
more passives and participial modifiers). Science selections also contain a broader 
range of dominant and supporting features than mathematics. The paragraphs 
below are excerpted from the sample science selection in Appendix A, which 
exhibits some of the features that typify the types of science texts analyzed in this 
research. 



The heaviness of each package is directly related to its mass. Mass is a measure of 
how much matter something contains. Weight is a measure of the force of gravity acting 
on a mass. So the more matter an object contains — the greater its mass — the more it will 
weigh. 

A spring scale, which is used to weigh objects, measures the effect of gravity on an 
object. To find an object's mass, you have to use a balance, like the one shown on page 
Cll. 

The most common metric units used to measure are grams (g) and kilograms (kg). A 
penny has a mass of about 2 g. A kilogram is one thousand times the mass of a gram. A 
large cantaloupe has a mass of about 1 kg (Badders et al., 2000, pp. C10-C11). 

This selection is from the topic Matter and is representative of the science 
selections in the study in terms of the descriptive data and the analyses of 
vocabulary, grammar, and basic organizational features. Because it teaches students 
about measuring mass and volume, it contains a higher than average number of 
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measurement words (e.g., kilogram), and also has more 3-or-more-syllable and 
morphologically-derived words than the average science selection. The first 
paragraph provides examples of nominalization (e.g., the heaviness of each...), 
specialized academic vocabulary (e.g., the force of gravity), and measurement words 
(e.g., grains [gj). 

In the paragraphs shown (and in the entire selection), the rhetorical mode is 
exposition and the dominant feature is description. Supporting features include 
comparison (e.g., ...the more matter an object contains — the greater its mass...), definition 
(e.g., mass is a measure of how much matter something contains.), and reference to other 
text or visual support (e.g., ...like the one shown on page Cll). In subsequent 
paragraphs, this selection contains the supporting feature sequencing (e.g., it 
provides the steps for measuring volume). 

Linguistic Characteristics of Social Studies Textbook Selections 

The descriptive, lexical, grammatical, and discourse features of the social 
studies textbook selections are discussed in turn below. A summary then follows 
that characterizes the general nature of the social studies selections. 

Descriptive Features 

The mean number of sentences per paragraph in social studies selections is 
3.98, with a range of 1-9 sentences per paragraph. Mean sentence length is 13.52 
words, with a range of 3-43 words per sentence. 

Lexical Features 

The average lexical diversity ratio for social studies text selections is .49, which 
is higher than for mathematics or science. On average approximately 24% of all 
word tokens and types in social studies selections were identified as academic 
vocabulary. There are typically more specialized academic words than any other 
category of academic vocabulary. Specialized vocabulary accounts for about 9% of 
total word tokens and 11% of total word types on average. Proper nouns make up a 
fairly high percentage of academic vocabulary (about 7% of total tokens and 5% of 
total types on average), but the use of measurement words and colloquialisms is 
rare, accounting for less than 1% of total words each in a selection. General academic 
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words make up approximately 3% of total word tokens and 7% of total word types 
on average. 

Low-frequency vocabulary and derived words each account for about 8% of 
total word tokens and 12% of total word types on average. Words with 3-or-more- 
syllables occur more frequently, accounting for about 12% of total word tokens and 
16% of total word types on average. A total of 21 different types of clause connectors 
were identified, with 105 occurrences across all the selections and an average of 9 
connectors per selection. Seventeen of the 21 connectors are adverbial dependent 
clause connectors, with an average of 10 connectors per 100 sentences and 7 per 1000 
words. The most frequently occurring connectors are when , and, and as. Other 
frequent connectors include hut and after. Nominalizations account for 2% of total 
word tokens and 3% of total word types on average. 

Grammatical Features 

The majority of sentences in social studies are simple sentences (63%), followed 
by complex sentences (33%). The number of compound and compound /complex 
sentences is small (less than 5% combined). Approximately 30% of all clauses 
identified in social studies selections are dependent clauses. 

The average number of passive voice verb forms per sentence is .16 (about one 
passive verb form every fifth sentence in social studies), and the average number of 
passive constructions with "by" is .02. Each sentence in social studies contains 
approximately 1 prepositional phrase, with a mean length of 4 words each and a 
range of 2-20 words in length. There are more noun phrases per sentence (3) on 
average than prepositional phrases, but the mean length is shorter than 
prepositional phrases (3 words). The noun phrases range in length from 1-19 words. 
There are approximately .17 participial modifiers per sentence, or about one every 
fifth sentence in social studies selections, as in science. 

Discourse Features 

In all 12 social studies selections, we identified a specialized rhetorical mode 
that combines exposition with a story-telling or narrative form of presenting 
information, which we call exposition through the use of narration. Three dominant text 
features were identified: description, explanation, and sequencing. Of the three, 
description is present in all 12 selections, explanation occurs in four, and sequencing is 
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found in three. Both explanation and sequencing may be topic specific, since they only 
appear in selections from two of the four topics. 



A total of 14 supporting features were identified, among which enumeration and 
labeling can be found in all 12 selections. Another two features that occur frequently 
are quotation and exemplification. 

Summary 

According to our research, fifth grade social studies selections contain a variety 
of academic vocabulary and low-frequency, 3-syllable, and derived words. Except 
for proper nouns, though, social studies vocabulary features share many similarities 
to science. The selections are composed of mostly simple sentences and 
approximately 30% complex sentences. Grammar does not appear to be complex in 
social studies, although it tends to contain longer sentences, prepositional phrases, 
and noun phrases than the other two subject areas. In terms of organizational 
features, social studies is similar to science, with the exception of a few unique 
features, such as the use of quotation. The example paragraphs below, excerpted 
from the sample social studies selection in Appendix A, exemplify some of the 
features considered to be typical in social studies. 

Slater slipped out of the country and came to the United States. Soon he was hired 
by a merchant to build spinning machines in Rhode Island. By 1790 Slater had built the 
first American machines to spin cotton into yam. 

Slater had to pay a high price for the cotton he used in his factory, which limited his 
profits. In 1793, however, an American inventor built a machine that made cotton 
cheaper to produce. His name was Eli Whitney. 

Whitney heard planters talk about how long it took enslaved workers to remove the 
stubborn seeds stuck to cotton. Whitney invented the cotton gin in ten days. Whitney's 
gin, which is short for "engine," helped workers clean up to 50 times more cotton than 
they could by hand. 

As you can see from the bar graph below, cotton production boomed after the 
invention of the cotton gin. Together, slave labor and the cotton gin made growing cotton 
more profitable. Many planters became more determined to keep slavery alive (Banks et 
al„ 2001, pp. 405). 

The selection this excerpt came from is the second shortest of all the social 
studies texts, however the features of this text, such as average sentence length and 
number of sentences per paragraph, are in line with the subject area averages, which 
is the hallmark of what makes a text "typical." This particular text selection does. 
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however, contain more passive phrases with "by" than other texts (e.g., ...in houses 
built by the mill ozvners ) and more specialized academic vocabulary (e.g., merchant, 
profitable), in part due to the topic Industrial Revolution, which contains many new 
and specialized terms for inventions. There is an example of nominalization in the 
fourth paragraph above (e.g., cotton production), a prenominal past participial 
modifier (e.g., enslaved workers), and a coordinate clause (e.g., . . .and the cotton gin...). 

The rhetorical mode is exposition via narration, as the selection narrates historical 
events in the context of the lives of individuals (e.g., Whitney heard planters talk...). 
The dominant organizational feature of this selection is description since many of the 
inventions are described in terms of what they do. Supporting features include 
explanation (e.g., explains why new inventions help people do things better or more 
efficiently), comparison (e.g., compares how much faster work could be done with 
new inventions), labeling (e.g., his name was Eli Whitney), and sequencing (e.g., by 
naming the years when people invented new machines). Later in the selection (see 
Appendix A), a mill girl's diary is quoted, providing an example of the supporting 
feature quotation. 

Comparison of Linguistic Features Across Subject Areas 

In the subsections below, we briefly discuss the major sections of the cross- 
subject-area profile. 

Descriptive Features 

Across subjects, we found that the mean number of sentences ranges from 3.13 
sentences per mathematics word problem to 4.18 sentences per science paragraph. 
At the sentence level, mean sentence length is slightly shorter in mathematics (11) 
than in science (13) and social studies (14). Social studies has a slightly wider range 
of sentence lengths (3-43 words) than in mathematics (1-39) and science (1-37). 

Lexical Features 

Based on examination of type /token ratios, we found that social studies (.49) is 
slightly more lexically diverse than either mathematics (.43) or science (.41). 
However, the diversity ratios across subjects appear to be relatively low given the 
assumption that one purpose of the textbooks is to introduce new grade-appropriate 
lexical items to students. 
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Comparing academic vocabulary usage across subjects, we found that 
mathematics selections contain fewer academic words than science and social 
studies overall, whereas science and social studies have comparable percentages of 
academic vocabulary. In examining the percentages of academic words by 
subcategories, we observed that all subjects contain more specialized academic 
vocabulary than any other subcategory. However, the proportions of other 
subcategories of academic words vary slightly from subject to subject. Mathematics 
contains a larger percentage of measurement words than science and social studies. 
On the other hand, science has the largest proportion of general academic 
vocabulary across subjects. Although social studies does not have as many general 
academic vocabulary words, it contains the highest percentage of proper nouns. 

In terms of low-frequency vocabulary, the percentages of total word types and 
tokens are similar across subjects. Low-frequency words play a relatively minor role 
in total word counts (about 8% overall), but a slightly greater role in total word 
types (about 12%) for all three subjects. Across subjects there are more 3-or-more- 
syllable words than there are derived words, both in types and tokens. In both cases, 
however, proportions of both 3-or-more-syllable and derived words in total word 
counts/word types are similar for science and social studies (15%-16% for 3-or- 
more-syllable words, 11%-12% for derived words), but the percentages are smaller 
in mathematics (9% for 3-or-more syllable words, 4% for derived words). 

Overall, there are more unique clause connectors (types) in science and social 
studies than in mathematics, but all three subjects have some frequently occurring 
connectors in common, such as and, when, if, and hut. Nominalizations appear 
infrequently across subjects. In science and social studies selections, they account for 
about 2% of total word counts and 3% of word types, and in mathematics they 
account for less than 1% of total words. 

Grammatical Features 

The majority of sentence types across subjects are simple sentences, followed 
by complex sentences. Mathematics has the highest percentage of simple sentences 
(81%), whereas in science and social studies about 60% of the sentences are simple 
sentences. Across subjects approximately 26%-29% of total clauses are identified as 
dependent clauses. 
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Use of passive voice verb forms in general and passive constructions with "by" 
varies by subject, with less than 1 passive form per sentence on average across 
subjects. Passive constructions in mathematics are almost nonexistent, whereas 
passive voice verb forms occur with higher frequency in science (.24 per sentence) 
than in social studies (.16). 

With regard to prepositional phrases, the usage is the same across subjects, 
with the average number per sentence being 1. The average length of prepositional 
phrases is approximately 4 words per phrase across subjects. The average number of 
noun phrases per sentence is 3 for all subjects. 

There are few participial modifiers of any kind (e.g., past or present) in the 
fifth-grade text selections analyzed, although there are slightly more in social studies 
(approximately 17 per 100 sentences) than in science (16). Mathematics has 
considerably fewer participial modifiers than the other two subjects (3 per 100 
sentences). 

Discourse Features 

We observed many contrasts across subjects at each level of analysis. First, the 
rhetorical mode varies across subjects according to the purpose for writing. The 
primary purpose of mathematics word problems is to provide problem-solving 
contexts. The word problems are typically limited in length and thus do not provide 
the extended discourse necessary for establishing rhetorical mode. Therefore, this 
feature could not be identified in the mathematics selections. Science and social 
studies both follow an expository form to present information; however, their 
rhetorical modes differ slightly from each other: science selections use a more 
traditional, straightforward expository form, whereas social studies selections 
employ a story-telling narrative form. 

Summary 

Overall, science and social studies contain similar numbers of dominant and 
supporting features, whereas mathematics selections exhibit less variety. For 
dominant text features, we found that description occurs across all three subjects and 
was identified in 30 of the 36 text selections analyzed in the current research. 
Explanation and sequencing occur in both science and social studies, but not in 
mathematics. Scenarios, on the other hand, occur only in mathematics. 
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Several supporting features were identified across subjects, the most frequent 
of which are: enumeration, comparison, and sequencing. In contrast, labeling, definition, 
exemplification, paraphrase, and references to supporting text or visuals are found almost 
exclusively in science and social studies. Additionally, social studies differs from 
science and mathematics in that it has quotation as a frequent supporting feature. 

Statistical Comparison of Linguistic Features Across Subject Areas 

Key features from descriptive, lexical and grammatical text analyses were 
chosen for statistical analysis based on the degree of contrast they displayed across 
the three subjects, as shown in Table 6.4 above. The univariate analysis of variance 
(ANOVA) procedure was conducted to test the significance in mean differences 
across subjects using, as appropriate, either the mean percentage of a given text 
feature (i.e., differences in the mean number of sentences per paragraph) or a 
standardized mean value (i.e., mean ratio of number of unique word types to total 
number of word tokens). 

In every case, because we compare percentages or other standardized values, 
the contrasts we analyze across subject areas are meaningful despite differences in 
the overall number of words in each selection. The ANOVAs were conducted with 
Bonferroni corrections (a conservative adjustment) due to the larger number of 
multiple comparisons that could have resulted in significant findings by sheer 
chance. With just 12 text selections for each of the subjects, the ANOVA results 
should be interpreted with caution. We therefore graphed confidence interval (Cl) 
bands for each of the individual subject area bars in Figures 1 and 2 below. 20 These 
CIs are set at p<.05, that is, there is a 95% chance that the mean values for a given 
language feature fall within the band around the mean. A number of these bands are 
quite wide (e.g., percentage of participial modifiers per sentence in social studies) 
which reflects the large degree of variation across the 12 selections within a subject. 
The Cl bands allow us to interpret the ANOVA results more conservatively: where 
subject means for a given text feature are found to be significantly different and Cl 
bands are non-overlapping, we can be more confident that any differences detected 
across subjects truly exist. 



20 The calculations for these CIs were based on the 12 text selections of each subject area (each subject 
area conducted independently) in order to be more conservative. We did this rather than rely on the 
less stringent CIs based on the combined total of 36 selections across subject areas that are 
automatically produced by the ANOVA procedure. 
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Figure 1. Linguistic Profiles of Fifth-Grade Social Studies (SS), Science (Sc) and Mathematics Text 
Selections (M): Subject matter averages for descriptive and lexical data 2 1 

Figure 1 presents the Cl bands for the descriptive and lexical profiles. In five 
contrasts: number of sentences and words, specialized academic zvords, percentage of 3-or- 
more-syllable and derived zvords, mathematics texts have significantly lower means 
than either science or social studies texts. ANOVA post hoc comparisons were 
significant at pc. 001 for all but the contrast between mathematics and social studies 
for specialized academic zvords, which is significant at pc. 05 only and has overlapping 
Cl bands that suggest the difference in means in this comparison should be 
interpreted with caution. Science has significantly more general academic zvords than 
either mathematics (post hoc comparison pc. 001) or social studies (post hoc 

21 Data ordered social studies, science, mathematics to reflect overall prevalence. 
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comparisons pc. 01), although the latter two do not differ significantly from each 
other. On just one feature, the lexical diversity ratio, social studies texts have a 
significantly higher ratio of word types to word tokens than either mathematics or 
science texts (post hoc comparisons pc. 001). Lexical diversity ratios for mathematics 
and science did not differ significantly. 

In terms of findings in the grammatical data (see Figure 2), in one key contrast 
( percentage of simple sentences ), mathematics texts have a significantly higher mean 
than either science or social studies texts (post hoc comparisons pc. 001). Conversely 
mathematics has far fewer complex sentences per selection on average than either 
science or social studies (post hoc comparisons pc. 001). Neither science nor social 
studies texts differ from one another on these sentence structure features. 
Mathematics has a considerably lower mean percentage of passive voice verb forms 
and participial modifiers per sentence than the other two subject areas (post hoc 
comparisons between mathematics and social studies for passive voice verb forms pc. 
01); between mathematics and social studies for participial modifiers pc. 001; between 
mathematics and science for both passive voice verb forms and participial modifiers 
pc. 001). On average social studies texts also have significantly fewer passive voice 
verb forms than science texts (post hoc comparison pc. 05) but the higher p-value and 
overlapping Cl bands strongly suggest caution when interpreting this finding. 
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Figure 2. Linguistic Profiles of Fifth-Grade Social Studies (SS), Science (Sc) and Mathematics 
Text Selections (M): Subject matter averages for grammatical data. 22 

Overall, these analyses show that there are statistically significant differences 
between the subjects in several areas, although some of the results should be 
interpreted with caution given the small number (12 selections per subject area) of 
text selections. However, any results at the sentence and paragraph levels are 
meaningful as the total number of sentences and paragraphs is more substantial (or 
in the case of mathematics, where the total number of word problems is 212). 

Synthesis with Prior CRESST Research Findings 

In Butler et al. (2004), we reviewed and synthesized the CRESST research on 
academic language in addition to other relevant research studies. We found that 
many studies could not be compared because the empirical methods used differ 
from study to study. Furthermore, they used qualitative approaches to analyze and 
describe academic language such that features of academic language across grade 
levels and subject areas are not described with enough specificity for the purposes of 
test development or articulating the trajectory of academic language learning and 
use across grade levels. In the current research we have emphasized a research 

22 Data ordered social studies, science, mathematics to be consistent with Figure 1. 
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approach that includes the development of replicable quantitative procedures and 
strong reliability. Despite these differences in the body of research on academic 
language, there are areas of overlap in the findings that are helpful in constructing a 
more detailed picture of the types of language students must manipulate in 
academic environments. 

To this end, we briefly revisit the synthesis in Butler et al. (2004), which 
includes a discussion of research results from multiple sources. 23 First, we find that a 
comparison of sentence length across grades and subjects in Butler et al. (2004) is 
consistent with our findings here, showing an increase in sentence length from 
mathematics (less than 10 words) to science (approximately 13 words) to social 
studies (14 words). The use of subordinate clauses was noted in three of the seven 
studies cited in Butler et al. (2004), which crosses all three subject areas as well as 
grade levels (see Appendix L) for a review of the Synthesis of Grammatical Features 
of Language Functions in Textbooks and Printed Materials from the 2004 report). 
Our findings here indicate that each typical selection in mathematics, science, and 
social studies is composed of 27%-29% dependent clauses. Future research will 
further analyze the types of dependent clauses used. 

Passive voice, nominal structures, and prepositional phrases were all cited as 
potential features of academic language in the 2004 synthesis. In the current 
research, while we found prepositional phrases and noun phrases occur in every 
sentence across subjects in the fifth-grade texts analyzed, passive voice verb forms 
only appear in every 4 to 5 sentences for science and social studies. The use of logical 
connectors was discussed in the context of grammatical features of linguistic 
functions; many of the same types of connectors were found in the current research, 
although we have described them in this report as clause connectors. Some of these 
include if, when, and but. More specificity is possible if the grammar of 
organizational features described in this report (e.g., comparative adjective forms 
used for the purpose of comparison, connectors used for the purposes of 
exemplification or sequencing) were to be analyzed. 

Last, in the 2004 report, we discuss the use of language functions in textbooks; 
in this report, we characterize language functions as a part of a larger category: 
organizational features of discourse. Of the 8 most frequently identified language 



23 Note that the synthesis did not include a discussion of academic or descriptive vocabulary 
features, so these will not be reviewed here. 
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functions in the 2004 report, 6 appear in the 15 most frequently identified 
organizational features in this study. They are: classification, description, comparison, 
definition, explanation, and sequencing (see p. 51 of Butler et al., 2004). Others appear 
as well, but less frequently (e.g., labeling). 

In Bailey et al. (2004), several of the same features identified in the text 
selections in the current research were identified in the oral discourse of teachers. 
For example, the use of exemplification was noted as a means of supporting learning 
or presenting new academic vocabulary words during a lesson. Paraphrase was used 
by teachers "...to avoid repair... by first using academic language... they guided and 
scaffolded students' meaning-making from the outset of the interaction" (p. 32). In 
the textbook selections, paraphrase is sometimes used for the same purpose, i.e., by 
using a more familiar word after a particularly difficult word; other times it is used 
in the opposite way (i.e., by introducing a more difficult word in the paraphrase as a 
means of introducing new academic vocabulary or to define new vocabulary). At the 
university level Chung and Nation (2003) show that similar techniques are used to 
help introduce or define new or difficult words in academic texts (e.g., by using 
parenthetic examples in texts, similar to the paraphrase examples we have described 
in this report). 

Bailey et al. (2004) also discuss a category identified in their research as 
process/application instruction, which has a function in oral language similar to two of 
the features identified in the texts: providing instruction or guidance and questions. In 
their research, teachers gave students guidance as to what to pay attention to while 
doing activities, which is similar to textbook authors referring students to prior 
chapters to look at diagrams or to remind students of concepts taught in prior 
lessons. By doing this, textbooks help students focus on relevant or helpful 
information available to them in the textbook and also help scaffold information. In 
the same process/application instruction category, examples are provided that parallel 
the current research in which teachers ask leading questions, which direct 
instruction and build anticipation of the lesson to come. This use of questioning is 
identical to many of the questions identified in the textbook selections in the current 
research, which usually are rhetorical in nature and are meant strictly to stimulate 
critical thought, activate prior knowledge, and/ or contextualize new concepts. 

Finally, it should be noted that across subjects, most supporting features, such 
as enumeration and exemplification, are embedded within dominant features such as 
description, usually for the purpose of expanding or adding detail to the texts, which 
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corresponds to findings in previous research (Butler et al., 2004). This interplay 
between features is an important feature of all texts and must be taken into 
consideration not only when teaching students how to read academic texts but also 
when designing assessments of academic language. We turn now to a discussion of 
how to apply these empirical data to the test development process. 

Implications for Test Development 

The goal of the current research is not only to create the descriptions of text 
features for each subject area provided above, it is also to determine which features 
should or should not be present in texts selected for assessment purposes and which 
features of the texts are critical for assessment. Therefore, the profiles can be 
discussed differentially based upon test developers' needs. Here, we will show 
which information has applications for selecting texts for test development and 
which information will play a role in the development of items and tasks. 

Text Selection 

Texts selected for general language assessment purposes should be 
representative of the types of texts all students will encounter across subject areas. In 
the present research, our goal is to develop standards-based language assessment 
prototype tasks based on authentic materials from each subject area in order to tap 
student mastery of the range of academic English that is used in the classroom. One 
part of this range of language used in the classroom is textbooks. A standards-based 
text selection process will consist of several steps: 

1. Selecting texts that reflect subject-area standards. 

2. Reviewing the selected texts against an initial set of criteria specific to 
each subject area. 

3. Reviewing the selected texts for cultural and/ or other types of bias. 

4. Screening the selected texts against specific linguistic criteria for each 
subject area. 

5. Subjecting the selected texts to an expert and teacher review. 

A more detailed example of the general procedures that might be followed for 
Steps 1-3 is provided in Appendix M (General Procedures for Text Selection: Stage 
1). After Steps 1-3 have been performed, the text selections would then be typed into 
an electronic file and a series of analyses corresponding to Step 4 would be run to 
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determine if the linguistic characteristics of the text selections correspond to the text 
profiles provided earlier in the chapter. 

Language features analyzed in the current research that have implications for 
Step 4 of the text selection process include: (a) the range and number of sentences 
per paragraph or word problem, (b) the range and number of words per sentence, 
(c) the lexical diversity ratio, (d) descriptive vocabulary features (percentage of low- 
frequency, 3-syllable, and derived words), and (e) the balance of different sentence 
types and clauses present in each selection. These descriptive features help form the 
basis of judging what is typical or atypical in a text selection. 

Item and Task Development 

Since textbooks are an integral part of classroom learning, the data generated in 
this report can be used to determine which linguistic features of textbooks are 
critical at the fifth-grade level for each subject area and also to help establish 
trajectories of language complexity across grade levels for each subject area. While 
the analyses performed in the current research are not exhaustive, they provide 
evidence of the prevalence of a range of features at the fifth-grade level upon which 
we can base our item and task development decisions. These features include: (a) 
nominalizations, (b) frequently-used clause connectors, (c) types and frequency of 
academic vocabulary, (d) passive constructions, (e) prepositional phrases, (f) noun 
phrases, (g) participial modifiers, and (h) frequently-used dominant and supporting 
discourse features. Table 40 provides a sample content framework for developing 
assessments of academic language proficiency, which lists a selection of the 
vocabulary, grammar, and text organization features investigated in the current 
research down the left and the three subject areas across the top. 
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Table 40 



Content Framework for Developing an Assessment of Academic Language Proficiency 





Mathematics 


Science 


Social Studies 


Vocabulary 


Non-academic vocabulary 3 


V 


V 


V 


Academic vocabulary (AV) 


General AV (high-frequency) 15 


V 


V 


V 


Specialized AV (defined in 


— 


V 


V 


context) 


Measurement words 


V 




- 


Proper nouns 


- 


- 


V 


Clause connectors' 


V 


V 


V 


Nominalizations 


— 


V 


V 


Grammar 


Noun phrases 


V 


V 


V 


Participial modifiers 


- 


V 


V 


Passive voice verb forms 


- 


V 


V 


Prepositional phrases 


V 


V 


V 


Organization of Text 


Comparison 


V 


V 


V 


Definition 


- 


V 


V 


Description 


V 


V 


V 


Enumeration 


V 


V 


V 


Exemplification 


- 


V 


V 


Explanation 


- 


V 


V 


Labeling 


- 


V 


V 


Paraphrase 


V 


V 


V 


Scenario 


V 


- 


- 


Sequencing 


V 


V 


V 



“See Appendix C for a list of the most frequently occurring words across subjects. 

b See Appendix B for a list of high-frequency general academic words that occur across subjects. 

'See Appendix D for a list of the most frequently occurring adverbial and coordinate clause 
connectors. 

If we use Table 40 as a guide for determining test content, we might select high- 
frequency non-academic and general academic vocabulary, prepositional phrases, 
and comparison, description, and sequencing for inclusion on a general assessment of 
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academic language proficiency. For assessments of subject-specific language 
proficiency, like the specialized advanced placement and subject area tests high 
school students often take before going to college, we might instead focus on 
measurement words for mathematics and science, participial modifiers and passive 
voice verb forms for science and social studies, and definition and labeling for science 
and social studies. Even finer distinctions can be made if all of the features 
investigated in this research were to be included in such a framework, (e.g., the text 
organization feature quotation is not included in the framework because it only 
occurs in social studies texts). This is a strong candidate for inclusion in a test of 
specialized academic language knowledge in social studies. 

The information presented in Table 40 has been sequenced in the order it was 
presented throughout the paper. For actual test development, the features may be 
re-organized into different categories; for example, grammatical features of text are 
embedded in the text organization and do not occur discretely. Therefore, students 
must be able to make meaning out of vocabulary and grammar in order to 
understand the main and supporting ideas in a text (e.g., students must comprehend 
comparative adjectives in order to understand that a comparison or contrast is being 
made; knowledge of adverbial connectors is critical to understanding a sequencing 
of events). 



Chapter Summary 

In this chapter, we summarize our findings into draft text profiles for each 
subject area and a cross-subject-area text profile and then show how the information 
can be used for test development purposes. These text profiles help to illustrate the 
differences and similarities between and across mathematics, science, and social 
studies at the fifth-grade level, in turn helping test developers select appropriate 
content for general and specialized tests of academic language proficiency. Future 
research will investigate the use of the features discussed in the current research at 
other grade levels. This information will feed into a content framework that shows 
the trajectory of language use for each feature within each content area by grade 
level. This information is particularly critical as testers work to develop individual 
items and assessments that distinguish ELs with differing levels of academic 
language proficiency from one another as well as from grade-to-grade. In addition, 
it will help test developers determine which combinations of grades form natural 
clusters. Up to now, testers have selected grade clusters (e.g., grades three through 
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five) with little empirical evidence upon which to base their decisions. Research of 
this kind not only helps to specify test content but also helps assure that the content 
of tests is grade-level appropriate. 

In the final chapter, we turn to the implications that this research has for test 
development as well as extensions to other applications in education. 
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CHAPTER 7: 



CONCLUSIONS AND RECOMMENDATIONS 

The work described in this report provides new information that will inform 
the discussion of the nature of academic language. The study also invested 
considerable effort in the creation of a methodology for conducting comprehensive 
linguistic analyses of texts. The academic vocabulary and discourse analyses in 
particular were iterative in that the processes for conducting analyses were 
developed, piloted, and revised as part of the study. The CRESST Academic English 
Language Proficiency (AELP) Guidelines for Linguistic Text Analyses (in 
preparation) used in the analyses for training will be available for future work in 
this area. This closing chapter highlights the major findings and enumerates their 
implications for language test development. Additional implications for other areas 
of educational practice are also briefly discussed. We close with methodological 
ramifications and the next steps we will undertake in language test development. 

Summary of Findings 

The text profiles presented in Chapter 6 are important to our understanding of 
what is unique and what is shared across different subject areas. The three subject- 
area text selections differed on a number of linguistic features we investigated. 
However, there were still some important similarities on basic measures of 
language, such as the average number of sentences used in a word problem or 
paragraph and the complexity of sentences as measured by the number of 
dependent clauses. Major differences across all three subject areas included the 
degree to which they made use of academic vocabulary and passive voice verb 
forms. Most differences distinguished mathematics from science and social studies; 
the latter two subject-areas were remarkably similar on most linguistic measures. 24 
Specifically, the mathematics texts had shorter sentences, a smaller repertoire of 
specialized vocabulary, fewer clause connectors, fewer complex sentences, and used 
fewer descriptions and explanations than either science or social studies. However, 



24 As mentioned, the word problem format in mathematics may have restricted the findings and does 
not represent all possible linguistic contexts of mathematics especially instruction. However, we 
argue that it does represent a fair approximation of the language dmands typical of mathematics 
assessment contexts. 
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mathematics had a greater number of simple sentences and made greater use of the 
scenario organizational feature than either science or social studies. 

Implications for Language Test Development 

The findings have direct application to the selection of texts for use in test 
development, as well as for content analysis and validation of the linguistic 
demands of test items (including the stimulus, question and expected response 
components of test items). Commonalties in linguistic features across subject areas 
are candidates for assessments of general academic language (domain-general 
language), whereas differences in linguistic features are candidates for developing 
subject-area (domain-specific) assessments or subject-specific test modules to add on 
to a general academic language test. 

Implications From Descriptive Analyses 

The basic quantitative information that the descriptive analyses provided will 
allow us to select texts that meet highly specific minimum, maximum and central 
tendency criteria for sentence length and paragraph length. These basic descriptive 
criteria differed between mathematics selections and science and social studies 
selections though not between the latter two. Mathematics sentences were a couple 
of words shorter on average than either science or social studies at 13 and 14 words 
respectively. The range between the minimum and maximum number of words per 
sentence was greatest for social studies, but sentence lengths for all three subjects 
ranged close to 1-40 words. Math word problems were 1 sentence shorter on average 
than the science and social studies paragraphs that averaged 4 sentences. These 
fundamental differences can be used to guide the selection of overall text length and 
sentence composition in the development of test items 

Implications From Lexical Analyses 

There are a number of implications for test development that lexical-level 
analyses have generated and warrant enumerating and summarizing in further 
detail here. 

1. In general, the mathematics texts we analyzed were quite different from 
both science and social studies texts; science texts had the highest 
proportion of general and specialized academic words; social studies 
texts were the most complex in terms of demanding vocabulary (e.g.. 
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highest lexical diversity ratio, highest proportions of content-related 
proper nouns, 3-or-more- syllable words, derived words, and a number 
of different types of clause connectors.) 

2. The relatively uniform findings across the different topics within 
subjects for most of the lexical features we examined support the use of 
these features for later subject-area generalizations in test development 
(e.g., selecting texts that mirror these lexical characteristics). 

3. A striking observation with implications for test development includes 
the contrast between rare uses of some words and the highly repetitive 
use of a small set of other words often found across all three subject 
areas (e.g., certain prepositions, determiners, and conjunctions). From a 
test development perspective, we might attempt to avoid words that are 
rarely used if the construct we wish to measure is general academic use 
of language, focusing instead on the vocabulary that forms a core of 
"must-know" words. 

4. Given that derived word forms occur in relatively low numbers across 
the subjects, we should attempt to avoid the selection of texts with large 
or inordinate numbers of such words for use in test development 
because they do not represent the norm. However, it should be noted 
that derived forms best account for words identified as academic 
vocabulary in all subjects, impressively so in the areas of science and 
social studies. Derivational formation of words can, therefore, help to 
identify academic vocabulary more systematically than any of the other 
lexical features we examined here, at least at the fifth-grade level. 

5. The relatively few similarities in academic words used across subject- 
areas (just 15 academic words in common), especially the infrequent use 
of general academic vocabulary, suggest that test developers still need to 
pay close attention to the individual characteristics of each subject area 
and to the particularities of individual texts selected for assessment 
purposes. Test developers are thus faced with striking a balance 
between the adoption of a census approach (i.e., assess every feature) in 
order to capture the full range of vocabulary in all academic settings, 
and the sampling of a restricted number of general academic words 
shared across subject areas in much greater depth. 



Implications From Grammatical Analyses 

The grammatical analyses reported here were selected based on our previous 
exploratory work (Butler et al., 2004) and the literature that identified a number of 
grammatical features (e.g., passive voice verb forms) to be hallmarks of academic 
texts. Some of the analyses, however, may have been focused at too demanding a 
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language level for typical fifth-grade textbooks in that use of passive voice verb 
forms and past and present participial modifiers, for example, was minimal in the 
selections in all three subject areas. While these features should not be a focus for 
test development at this level, tracking their presence is important for establishing 
an academic language trajectory. At the level of the clause and sentence, on the other 
hand, a small number of measures of length and frequency suggest important 
differences and commonalties across subjects that will need to be taken into account 
for test development purposes. 

1. While all subject- area texts in this study have a greater number of simple 
sentences than complex sentences, the mathematics texts consist almost 
exclusively of simple sentences, with science and social studies texts 
containing closer to a 60/40 split between simple and complex 
sentences. Texts selected for use on academic language proficiency 
reading tests should reflect these tendencies. 

2. Sentences in the three subjects are composed of the same number of 
phrase types: both prepositional and noun phrases are distributed 
comparably in mathematics, science and social studies with each 
sentence containing just one prepositional phrase and three noun 
phrases on average. Similarities also extend to the length of these phrase 
types. This information could be used for item specifications in test 
development. This finding, however, is surprising given that science and 
social studies texts containing longer sentences and prepositional phases 
are generally thought to contribute to increased sentence length. This 
suggests that sentence length in science and social studies is likely 
attributable to other grammatical structures that increase sentence 
length, such as embedded clauses (e.g., relative clauses). Finer 
discrimination among clause types should be included in future profiles 
of texts. 

3. A number of grammatical features that were investigated in this study 
may only become characteristic of print academic language in later 
grades, and this finding can be taken into consideration by largely 
avoiding such grammatical features when selecting texts and creating 
items appropriate for the fifth-grade level. However, given the relative 
greater prevalence of these features already in science and social studies 
at the fifth-grade, we might predict from the current findings that these 
subject areas will be more likely than mathematics texts to contain these 
features at the higher grades. 
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Implications From Organizational Features Analyses 



Ultimately, the organization of texts pulls the other analyses mentioned above 
together under the same umbrella because most features, whether grammatical or 
lexical, interact within the organizational structure of texts. The implications of this 
are multifold, beginning with the need to link relevant features with text 
organization features; for example, students must recognize comparative adjectives 
in order to understand that a comparison is being made. In addition, the different 
features must be tested in their contexts of use. As shown in Chapter 5, for example, 
many supporting organizational features are embedded within dominant features 
occurring mostly at the sentence level; however, the most frequently occurring 
supporting features (e.g., comparison, sequencing, etc.) occur in a greater variety of 
contexts. Establishing the relationships between grammatical and lexical features, as 
well as understanding the typical contexts of use for features at a particular grade 
level and subject area, enables test developers to organize the content of tests in 
meaningful ways. Indeed, test organization should reflect language in use, not 
discrete points of language divorced from context. 

1. Specific areas of concern for test developers include the differences in 
text organization across subject areas. In our research we found that 
science and social studies texts make broader use of a range of 
organizational features than mathematics. 25 Frequently used features 
shared by these two subject areas must be considered for general tests of 
academic language proficiency since these subject areas reflect a more 
substantial portion of students' reading loads, as opposed to 
mathematics word problems. 

2. Conversely, areas of overlap across the three subject areas indicate core 
features that students must be able to recognize and interpret in the texts 
they read across subject areas. These features include comparison, 
description, enumeration, paraphrase, and sequencing. Texts selected for test 
development should reflect these features, and test items should tap 
students' abilities to grasp the meaning and purpose of the features in a 
given text. 



25 The absence of some organizational features in mathematics textbooks may unfortunately be 
consistent with the recent concerns of mathematicians who have called for the greater use of 
generalization and inclusion of proofs in mathematics education (e.g., Schoenfeld, 1994; Kaput, 
Schoenfeld, and Dubinsky, 1996). Word problems would seem to be particularly well positioned to 
allow for the generalization of mathematical concepts across contexts and it is hoped that textbook 
writers (following reformed content standards) could make use of this opportunity in the future. 
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3. Our research also points to the complexity of many of the organizational 
features; for example, definition is provided in multiple ways: through 
traditional definition (e.g., a dog is a type of animal), through labeling (e.g., 
one type of popular domestic animal is called a dog), through paraphrase (e.g., 
a type of popidar pet, dogs, can be found in most American homes), and 
classification (e.g., a chihuahua, a cocker spaniel, and a boxer are all dogs). 
Comparisons are made for the purpose of describing, explaining, and 
exemplifying. Test developers should consider the multiple layers of 
occurrence with text features when creating test specifications and 
subsequently when writing test items. Many of these complexities in 
language represent levels of language proficiency, (i.e., some subtleties 
of the features may be understood and produced more frequently at 
lower levels of proficiency, while others may be more commonly used 
by advanced level ELs). 

4. Another area of interest for test developers identified in the current 
research is the frequent use of features that assist students with their 
reading (e.g., paraphrase, providing instruction or guidance, and references to 
other text or visuals). As mentioned in Chapter 6, these features are used 
in oral classroom contexts as well by teachers when they present and 
explain academic material. Therefore, test developers should consider 
the inclusion of items that tap students' ability to understand when 
assistance is being offered and whether students are able to use that 
assistance effectively. 



Additional Implications 

A potentially useful extension of our work is to inform test developers in the 
content areas. As they develop tests of mathematics, science and social studies 
content, these tests developers should also consider the appropriate language level 
for their items. Research of the type reported here provides empirical grade-level 
information that can inform test development decisions. Along similar lines, the 
findings reported here can be utilized by textbook and curriculum developers as 
well as professional development programs in order to typify the level of language 
demand EL students are expected to master for the successful reading of subject- 
area material. 
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Methodological Ramifications 



Number of Text Selections per Subject Area 

The number of text selections in this study, while limited from a quantitative 
perspective, provided a rich data source for the type of descriptive and basic 
inferential statistical analyses conducted. The text selection process we devised to 
facilitate in-depth linguistic analyses, meant that for science and social studies we 
deliberately chose to maintain text selections in their topic entirety rather than take a 
census of a larger number of randomly sampled single paragraph selections across 
the textbooks. While a census may have been amenable to the mathematics textbook 
format, in which there are few discourse-level organizational features, restricting the 
data to a large number of unrelated paragraphs would not have allowed for the 
characterization of text features that capture rhetorical devices and language 
functions, for example, in science and social studies. 

The linguistic analyses in this study were labor intensive and were part of the 
development of a process for characterizing the academic language used in 
textbooks. Now that a process is in place with specific guidelines for the analyses, 
future work across grade levels can be carried out more efficiently and should allow 
for the use of larger numbers of text selections. The data will then be amenable to 
more sophisticated inferential and other statistical techniques. 

Incorporation of Additional Grammatical Features 

The grammatical analyses conducted in this research have focused primarily 
on those English language structures that the literature suggested would be most 
challenging for English learners reading English texts (e.g., passive constructions, 
participial modifiers). While few constructions of these types appeared in any 
frequency in the fifth-grade textbooks we analyzed, documenting the limited 
emergence of these features at the fifth grade will be important in establishing cross- 
grade level developmental markers for academic English. For future analyses at this 
and earlier grade levels, however, we suggest beginning with a comprehensive 
descriptive analysis of the grammatical features evident in the texts. Analyses of 
texts in terms of more basic and commonly occurring grammatical structures such as 
simple and complex declarative forms, interrogative forms, negative forms, and 



120 




description of tense and aspect will help ensure a more comprehensive grammatical 
characterization of the texts. 



Next Steps 



Prototype Task Development 

Future CRESST work will focus on utilizing the linguistic profiles for creating 
test specifications, including guidelines for text selection and prototype task /item 
writing. As mentioned in Chapter 6, once it has been determined that a text and 
associated tasks fit into the parameters established in the current research, they will 
undergo both internal and external reviews by curriculum and/or language experts 
and subject area teachers. External reviews will provide a combination of focus 
group data and questionnaire data and will include a sensitivity review (to detect 
biases in terms of gender, race, etc.), a text and item review to assure that the texts 
and items are topically and linguistically representative of the types that teachers 
typically use in their classrooms, and a language review to catch any additional 
linguistic anomalies or concerns that may exist. Based on this feedback texts will be 
retained or rejected, and items can be either rejected or modified based on the 
review. 

Extension to Additional Grade Levels 

The results of the analyses have implications for applications of the research 
methodology to different grade levels or grade clusters, particularly higher grade 
levels where textbooks may include far greater amounts of such lexico-grammatical 
features as nominalizations, greater numbers of and diversity in clausal connectors, 
passive verb constructions, prepositional phrases, and participial modifiers. For 
example, we expect that the use of passive voice verb forms will increase with each 
grade level; however, currently we do not have empirical evidence showing at 
which grade the comprehension and use of passive voice becomes critical for young 
readers, as well as whether there are frequency differences among subclassifications 
of passive constructions at different grade levels. Research such as this will lead to 
the identification of trajectories of language use by grade level and in each subject 
area, valuable not only in testing but also in materials and curriculum development 
and professional development. 
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APPENDIX A 



Example Texts from Mathematics, 
Science, and Social Studies 



Mathematics Texts 

A large can of juice contains 1.5 liters and sells for $2.09. A smaller can of the 
same juice contains 750 milliliters and sells for $0.98. Which is the better buy? 
(Remember: There are 1000 milliliters in 1 liter). 26 

Willoughby et al., 2003, p. 268 



The Last Federal Trust Company pays $0.06 for each dollar that you keep in a 
savings account for a year. If you keep $250 in a savings account there for one year, 
how much will they pay you? 27 

Willoughby et al., 2003, p. 86 



When a new video game was released, a large toy store sold twice as many 
games on the 1 st day as on the 4 th day. On the 3 rd day, they sold 134 games. On the 2 nd 
day, they sold 27 more games than on the 3 rd day and 52 less than on the 4 th day. 
How many games were sold on the 1 st day? 28 

Greenes et al., 2002, p. 541 



26 Text excerpt from Willoughby et al., SRA/ McGraw-Hill Mathematics, Grade 5. Copyright © 2003 
by SRA/ McGraw-Hill Companies. Reproduced with permission of The McGraw-Hill Companies. 

27 Text excerpt from Willoughby et al., SRA/ McGraw-Hill Mathematics, Grade 5. Copyright © 2003 
by SRA/ McGraw-Hill Companies. Reproduced with permission of The McGraw-Hill Companies 

28 Text from Houghton-Mifflin Mathematics, Grade 5 Student Book. Copyright © 2002 by Houghton- 
Mifflin Company. Reprinted by permission of the publisher. All rights reserved. 
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What if Jessica's tower was 11/12 yard after 3 weeks? This was 1/6 yard 
taller than it was after 2 weeks. The height at 2 weeks was 1/4 yard taller than it was 
after the first week. How tall was the tower after the first week? 29 

Maletsky et al., 2001, p. 327 



29 From HARCOURT MATH, Grade 5, Student Edition. Copyright © by Harcourt, Inc. Included by 
permission of the publisher. 
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Science Text 



Measuring Mass and Volume 

It's a cold day, and you and a friend are standing at a bus stop. You've been 
shopping, and you each have a package to hold. One package is quite heavy; the 
other is lighter but is larger and more bulky. Which package would you choose to 
hold? 

Like everything around you, the packages are made up of matter. Matter is 
anything that has mass and volume. In fact, the problem of which package is easier 
to hold involves these two physical properties — mass and volume. As seen in the 
activities on pages C6 to C8, these properties can be measured. To review and 
practice your skills for measuring these and other properties, read pages H6 to H9 in 
the Science and Math Toolbox. 

The heaviness of each package is directly related to its mass. Mass is a 
measure of how much matter something contains. Weight is a measure of the force 
of gravity acting on a mass. So the more matter an object contains — the greater its 
mass — the more it will weigh. 

A spring scale, which is used to weigh objects, measures the effect of gravity 
on an object. To find an object's mass, you have to use a balance, like the one shown 
on page Cll. 

The most common metric units used to measure are grams (g) and kilograms 
(kg). A penny has a mass of about 2 g. A kilogram is one thousand times the mass of 
a gram. A large cantaloupe has a mass of about 1 kg. 

Other units are also used for measuring mass in the metric system. For 
example, the mass of a very light object could be measured in milligrams (mg). One 
milligram is equal to one thousandth (1/1000) of a gram. 

The volume of an object is the amount of space it takes up. For example, an 
inflated balloon takes up more space — has greater volume — than an empty balloon. 
Volume can also be used to express capacity — that is, how much material something 
can hold. A swimming pool can hold a lot more water than a teacup can. 

The basic unit of volume in the metric system is the cubic meter (m3). But 
because 1 m3 is such a large amount, the liter (L) is more commonly used. A liter is 
slightly larger than a quart. Many soft drinks are sold in 2-L containers. Units used 
to measure smaller volumes include the centiliter (cL), which is one hundredth of a 
liter, and the milliliter (mL), which is one thousandth of a liter. 

A graduated cylinder, which is often called a graduate, is used to measure 
liquid volumes. Using a graduate is similar to using a measuring cup. 

Suppose you want to know how much water or some other liquid is in a 
container of some kind. First you pour the liquid from the container into a graduate. 
Then you measure the level of the liquid against the scale marked on the side of the 
graduate. 

There are two methods for finding the volume of a solid. One method is used 
for finding volumes of solids that have regular geometric shapes, such as cubes. 
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spheres, and rectangular blocks. For any solid with a regular shape, you can 
measure such dimensions as length, width, height, and diameter. Then you can 
calculate the volume of the solid by substituting the measurements in a 
mathematical formula. For example, the volume of a rectangular block can be found 
by multiplying its length times its width times its height. The formula for this 
calculation is below. V=1 x w x h. 

Many solids do not have a regular shape. A rock, for example, is likely to 
have an irregular shape. The volume of these kinds of solids can be found by using 
the water displacement method. 

Suppose you want to use the water displacement method to find the volume of a 
rock, such as the one shown in the picture. The first step is to find a graduate large 
enough to hold the rock. Next, you fill the graduate about one-third full with water. 
Then you lower the rock into the graduate, as shown. 30 

Badders et al., 2000, pp. C10-C12 



30 Text from Houghton-Mifflin Science: Discovery Works, Grade 5 Student Book. Copyright © 2000 
by Houghton-Mifflin Company. Reprinted by permission of the publisher. All rights reserved. 
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Social Studies Text 



The Industrial Revolution 

The new machine that Nathan Appleton described had been invented in 
Great Britain. In the late 1700s British inventors and businesses brought about the 
changes in industry and technology that became known as the Industrial 
Revolution. The Industrial Revolution changed the way goods were made. Goods 
that had been made by hand in homes or workshops were now made by machines, 
often in factories. 

Before the Industrial Revolution, women and children slowly spun yarn and 
wove cloth by hand. The first British factories used water-powered machines to spin 
cotton yarn and weave cloth. After the Industrial Revolution, production increased 
and costs decreased. 

At the time of Britain's Industrial Revolution, the young United States was 
still mainly a land of farms. Before long, though, a British mechanic named Samuel 
Slater would bring the Industrial Revolution to the United States. His yarn-spinning 
machine would come to represent the beginning of a new way of life for our 
country. 

Because of the Industrial Revolution, no other country in the world could 
make cloth as cheaply as Great Britain. The British wanted to keep their profitable 
technology a secret. So they passed laws making it illegal to export machines or 
machine plans. The people who operated machines in cotton factories were not even 
allowed to leave the country. 

In 1789 Samuel Slater memorized the plans of the British spinning machines. 
He had heard that, because of the free market in the United States, business owners 
there would pay for this new technology. In a free market, producers of goods and 
services freely decide how to use resources in response to demand. People in the 
United States wanted to start their own business in making cloth. 

Slater slipped out of the country and came to the United States. Soon he was 
hired by a merchant to build spinning machines in Rhode Island. By 1790 Slater had 
built the first American machines to spin cotton into yarn. 

Slater had to pay a high price for the cotton he used in his factory, which 
limited his profits. In 1793, however, an American inventor built a machine that 
made cotton cheaper to produce. His name was Eli Whitney. 

Whitney heard planters talk about how long it took enslaved workers to 
remove the stubborn seeds stuck to cotton. Whitney invented the cotton gin in ten 
days. Whitney's gin, which is short for "engine," helped workers clean up to 50 
times more cotton than they could by hand. 

As you can see from the bar graph below, cotton production boomed after the 
invention of the cotton gin. Together, slave labor and the cotton gin made growing 
cotton more profitable. Many planters became more determined to keep slavery 
alive. 
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The cotton gin helped create a plentiful supply of cotton. However, the 
United States did not have Great Britain's water-powered machines, called "power 
looms." The looms wove cloth more quickly and cheaply than Slater's machines. 

Like Samuel Slater and Eli Whitney, Francis Cabot Lowell helped spread the 
Industrial Revolution in the United States. In 1810 Lowell, a New England merchant, 
toured several cloth-making factories in Great Britain. He decided to build a factory 
of his own. In 1813 Lowell and his partners built our country's first power loom, in 
Waltham, Massachusetts. For the first time all stages of cloth-making — from 
spinning cotton into thread to weaving yarn into cloth — happened under one roof. 

The swift waters of the Charles River powered the machines. The diagram on 
the next page shows how the water-wheel spun big leather belts. The belts, in turn, 
kept the machines moving. 

Lowell died in 1817, but his business partners later built several textile mills 
next to the Merrimack River in Massachusetts. They also built a town, which they 
called Lowell, around the mills for the workers. It was the first planned town for 
workers to be built in the United States. 

Mostly unmarried women between the ages of 15 and 19 worked at the mills 
in Lowell and other towns. Few jobs were open to women then. Therefore, many 
were glad to get the work, although they had long and tiring days. 

The women from New England who worked at Lowell were called "mill 
girls." They lived in boarding houses built by the mill owners. In their spare time, 
the women attended lectures and reading clubs. Some also wrote poetry and stories 
for the Lowell Offering, a magazine published by the mill girls. 

A mill girl spent 12 to 14 hours a day working at her machine, six days a 
week. The noise was often deafening. Lucy Larcom complained of "the buzzing and 
hissing pulleys and rollers and spindles." Read the following excerpt from a Lowell 
mill girl's letter home to her father. What rule did she have to follow? Why do you 
think that rule was important? 31 

Banks et al., 2001, pp. 404-406 



31 Text excerpt from Banks et al., McGraw-Hill United States: Adventures in Time and Place, Grade 
5. Copyright © 2001 by McGraw-Hill Companies. Reproduced with permission of The McGraw-Hill 
Companies. 
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APPENDIX B 



Academic Words Common across All Three Subject Area Selections 

contained 
continued 
equal (adjective) 
example 
explains 
express (verb) 
in order to * 
increase (verb) 
km (kilometer) 
population 
pound (weight) 
produce (verb) 
products 
separate 
suppose 
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APPENDIX C 



The Twenty Most Frequently Occurring Words by Subject Area 





Mathematics 




Science 




Social Studies 


Word 


No. of Occurrences 


Word 


No. of Occurrences 


Word 


No. of Occurrences 


the 


410 


the 


585 


the 


857 


°f 


258 


a 


244 


to 


355 


a 


187 




236 


°f 


286 


to 


154 


is 


185 


and 


277 


hozv 


145 


and 


154 


in 


261 


in 


118 


in 


150 


a 


229 


and 


114 


to 


147 


they 


127 


is 


112 


water 


144 


were 


125 


for 


112 


are 


74 


was 


125 


each 


83 


as 


74 


had 


105 


she 


76 


can 


71 


for 


93 


he 


75 


that 


70 


their 


91 


on 


67 


on 


60 


that 


90 


much 


60 


an 


53 


he 


82 


was 


59 


you 


52 


on 


75 


zvhat 


58 


from 


52 


by 


64 


many 


51 


or 


51 


as 


64 


did 


45 


it 


50 


from 


63 


about 


44 


this 


47 


people 


62 


if 


43 


mass 


45 


or 


53 
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APPENDIX D 



Clause Connectors across Subject Areas 



Adverbial dependent 
clause connectors: 


Mathematics 


Science 


Social Studies 


after 


X 


X 


X 


(al)though 




X 


X 


as 




X 


X 


as if 






X 


because 




X 


X 


before 


X 


X 


X 


even before 






X 


even though 




X 


X 


ever since 




X 




if 


X 


X 


X 


once 




X 




since 




X 


X 


so 


X 


X 


X 


so that 


X 




X 


until 




X 


X 


when 


X 


X 


X 


where 




X 




whether 






X 


while 


X 


X 


X 


Coordinate clause 
connectors: 








and 


X 


X 


X 


but 


X 




X 


or 


X 


X 


X 


nor 


X 




X 
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APPENDIX E 



Sentence Types 



Aggregate Counts of Sentence Types in Subject Areas by Topic 



Table El 



Aggregate Counts of Sentence Types in Mathematics by Topic 







Topic Totals 








Statistics 


Decimals 


Fractions 


Multiplication 


Ratio 


Area Total 


Simple 


143 


130 


130 


128 


531 


Complex 


21 


30 


30 


30 


111 


Compound 


2 


5 


1 


2 


10 


Compound /complex 


1 


0 


1 


0 


2 


% of total that are simple 


85.62 


78.79 


80.25 


80.00 


81.19 


% of total that are complex 


12.57 


18.18 


18.52 


18.75 


16.97 


% of total that are compound 


1.20 


3.03 


0.62 


1.25 


1.53 


% of total that are 
compound/complex 


0.60 


0.00 


0.62 


0.00 


0.31 



Table E2 

Aggregate Counts of Sentence Types in Science by Topic 


Statistics 




Topic Totals 








Matter 


Plants 


Storms 


Water Cycle 


Area Total 


Simple 


84 


104 


86 


77 


351 


Complex 


57 


48 


48 


47 


200 


Compound 


4 


2 


3 


1 


10 


Compound /complex 


1 


0 


4 


0 


5 


% of total that are simple 


57.53 


67.53 


60.99 


61.60 


62.01 


% of total that are complex 


39.04 


31.17 


34.04 


37.60 


35.34 


% of total that are compound 


2.74 


1.30 


2.13 


0.80 


1.77 


% of total that are 


0.68 


0.00 


2.84 


0.00 


0.88 


compound/complex 
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Table E3 



Aggregate Counts of Sentence Types in Social Studies by Topic 





Topic Totals 








Statistics 


Declaration of Independence 


Industrial Revolution 


Pilgrims 


Slavery 


Area Total 


Simple 


85 


138 


156 


139 


518 


Complex 


81 


57 


56 


64 


258 


Compound 


3 


4 


3 


13 


23 


Compound /complex 


3 


0 


1 


6 


10 


% of total that are simple 


49.42 


69.35 


72.22 


62.61 


64.03 


% of total that are complex 


47.09 


28.64 


25.93 


28.83 


31.89 


% of total that are compound 


1.74 


2.01 


1.39 


5.86 


2.84 


% of total that are 
compound /complex 


1.74 


0.00 


0.46 


2.70 


1.24 



Topic Averages for Sentence Types in Subject Areas 

Table E4 

Topic Averages for Sentence Types in Mathematics 


Statistic 




Topic Averages 






Decimals 


Fractions 


Multiplication 


Ratio 


Area Average 


Simple 


47.67 


43.33 


43.33 


42.67 


44.25 


Complex 


7.00 


10.00 


10.00 


10.00 


9.25 


Compound 


0.67 


1.67 


0.33 


0.67 


0.83 


Compound /complex 


0.33 


0.00 


0.33 


0.00 


0.17 


% of total that are simple 


86.10 


78.87 


78.93 


0.80 


80.95 


% of total that are complex 


11.91 


18.09 


19.62 


0.19 


17.14 


% of total that are compound 


1.25 


3.04 


0.72 


0.01 


1.54 


% of total that are 


0.74 


0.00 


0.72 


0.00 


0.37 


compound /complex 













Table E5 

Topic Averages for Sentence Types in Science 


Statistic 




Topic Averages 






Matter 


Plants 


Storms 


Water Cycle 


Area Average 


Simple 


28.00 


34.67 


28.67 


25.67 


29.25 


Complex 


19.00 


16.00 


16.00 


15.67 


16.67 


Compound 


1.33 


0.67 


1.00 


0.33 


0.83 


Compound /complex 


0.33 


0.00 


1.33 


0.00 


0.42 


% of total that are simple 


56.99 


66.62 


60.52 


61.42 


61.39 


% of total that are complex 


39.70 


31.98 


34.11 


37.51 


35.83 


% of total that are compound 


2.65 


1.39 


2.16 


1.08 


1.82 


% of total that are 


0.65 


0.00 


3.21 


0.00 


0.96 


compound /complex 
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Table E6 

Topic Averages for Sentence Types in Social Studies 



Topic Averages 



Statistic 


Declaration of 
Independence 


Industrial 

Revolution 


Pilgrims 


Slavery 


Area Average 


Simple 


28.33 


46 


52 


46.33 


43.17 


Complex 


27.00 


19 


18.67 


21.33 


21.5 


Compound 


1.00 


1.33 


1 


4.33 


1.92 


Compound /complex 


1.00 


0 


0.33 


2 


0.83 


% of total that are simple 


48.41 


68.82 


71.31 


62.11 


62.66 


% of total that are complex 


46.81 


28.60 


26.80 


28.81 


32.76 


% of total that are compound 


1.89 


2.11 


1.41 


6.01 


2.86 


% of total that are 


1.71 


0.00 


0.47 


2.96 


1.28 



compound / complex 



142 




APPENDIX F 



Clause Types 



Aggregate Counts of Clause Types in Subject Areas by Topic 



Table FI 



Aggregate Counts of Clauses in Mathematics by Topic 







Topic Totals 






Statistic 


Decimals 


Fractions 


Multiplication 


Ratio 


Area Total 


No. of sentences 


167 


165 


162 


160 


654 


No. of clauses 


234 


222 


223 


233 


912 


No. of dep. clauses 


60 


55 


60 


67 


242 


Dependent clauses as a % of 
total clauses 


25.64 


29.77 


26.91 


28.76 


26.54 


No. of coord, clauses 


7 


2 


1 


6 


16 


Coordinate clauses as a % of 
total clauses 


2.99 


0.90 


0.45 


2.58 


1.75 



Table F2 

Aggregate Counts of Clauses in 


Science by Topic 














Topic Totals 








Statistic 


Matter 


Plants 


Storms 


Water Cycle 


Area Total 


No. of sentences 


146 


154 


141 


125 


566 


No. of clauses 


219 


211 


208 


186 


824 


No. of dep. clauses 


67 


55 


60 


60 


242 


Dependent clauses as a % of 
total clauses 


30.59 


26.07 


28.85 


32.26 


29.37 


No. of coord, clauses 


6 


2 


7 


1 


16 


Coordinate clauses as a % of 
total clauses 


2.74 


0.95 


3.37 


0.54 


1.94 
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Table F3 



Aggregate Counts of Clauses in Social Studies by Topic 







Topic Totals 








Statistic 


Declaration of 
Independence 


Industrial 

Revolution 


Pilgrims 


Slavery 


Area Total 


No. of sentences 


174 


200 


216 


222 


812 


No. of clauses 


298 


269 


291 


329 


1187 


No. of dep. clauses 


116 


65 


69 


89 


339 


Dependent clauses as a % of 
total clauses 


38.93 


24.16 


23.71 


27.05 


28.56 


No. of coord, clauses 


8 


4 


6 


18 


36 


Coordinate clauses as a % of 
total clauses 


2.68 


1.49 


2.06 


5.47 


3.03 



Topic Averages for Clause Types in Subject Areas 

Table F4 

Topic Averages for Clauses in Mathematics 








Topic Averages 






Statistic 


Decimals 


Fractions 


Multiplication 


Ratio 


Area Average 


No. of sentences 


55.67 


55.00 


54.00 


53.33 


54.50 


No. of clauses 


78.00 


74.00 


74.33 


77.67 


76.00 


No. of dep. clauses 


20.00 


18.33 


20.00 


22.33 


20.17 


Dependent clauses as a % of 
total clauses 


25.66 


24.92 


26.99 


28.07 


26.41 


No. of coord, clauses 


2.33 


0.67 


0.33 


2.00 


1.33 


Coordinate clauses as a % of 
total clauses 


3.20 


0.94 


0.50 


2.75 


1.85 



Table F5 

Topic Averages for Clauses in Science 


Topic Averages 


Statistic 


Matter 


Plants 


Storms 


Water Cycle 


Area Average 


No. of sentences 


48.67 


51.33 


47.00 


41.67 


47.17 


No. of clauses 


73.00 


70.33 


69.33 


62.00 


68.67 


No. of dep. clauses 


22.33 


18.33 


20.00 


20.00 


20.17 


Dependent clauses as a % of 


30.16 


26.29 


28.87 


32.41 


29.43 


total clauses 












No. of coord, clauses 


2.00 


0.67 


2.33 


0.33 


1.33 


Coordinate clauses as a % of 


2.81 


0.97 


3.51 


0.64 


1.98 


total clauses 
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Table F6 



Topic Averages for Clauses in Social Studies 



Topic Averages 


Statistic 


Declaration of 
Independence 


Industrial 

Revolution 


Pilgrims 


Slavery 


Area Average 


No. of sentences 


58.00 


66.67 


72 


74 


67.67 


No. of clauses 


99.33 


89.67 


97 


109.667 


98.92 


No. of dep. clauses 


38.67 


21.67 


23 


29.6667 


28.25 


Dependent clauses as a % of 


38.98 


24.24 


23.76 


26.95 


28.49 


total clauses 












No. of coord, clauses 


2.67 


1.33 


2 


6 


3 


Coordinate clauses as a % of 


2.71 


1.52 


1.98 


5.51 


2.93 


total clauses 
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APPENDIX G 



Topic Averages for Passives in Subject Areas 



Table G1 

Topic Averages for Passives in Mathematics 



Topic Averages 


Statistic 


Decimals 


Fractions 


Multiplication 


Ratios 


Area Average 


No. of clauses 


69.33 


70.33 


62.00 


73.00 


68.67 


No. of sentences 


55.67 


55.00 


54.00 


53.33 


54.50 


No. of passive voice verb forms 


0.67 


3.33 


2.67 


2.67 


2.33 


No. of passive voice verb forms 
per clause 


0.01 


0.05 


0.04 


0.04 


0.03 


No. of passive voice verb forms 
per sentence 


0.01 


0.06 


0.05 


0.05 


0.04 



Table G2 

Topic Averages for Passives in Science 


Topic Averages 


Statistic 


Matter 


Plants 


Storms 


Water Cycle 


Area Average 


No. of clauses 


73.00 


70.33 


69.33 


62.00 


68.67 


No. of sentences 


48.67 


51.33 


47.00 


41.67 


47.17 


No. of passive voice verb forms 


14.33 


12.33 


6.00 


12.00 


11.17 


No. of passive voice verb forms per 


0.20 


0.17 


0.09 


0.19 


0.16 


clause 












No. of passive voice verb forms per 


0.29 


0.24 


0.13 


0.30 


0.24 


sentence 












No. of passive "by" phrases 


0.00 


3.33 


0.00 


3.00 


1.58 


% of clauses with "by" phrases 


0.00% 


4.65% 


0.00% 


4.79% 


2.36% 


% of passive voice verb forms that 


0.00% 


29.28% 


0.00% 


19.52% 


12.20% 


include "by" phrases 












No. of passive "by" phrases per 


0.00 


0.06 


0.00 


0.08 


0.03 


sentence 
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Table G3 

Topic Averages for Passives in Social Studies 



Topic Averages 


Statistic 


Declaration of 
Independence 


Industrial 

Revolution 


Pilgrims 


Slavery 


Area Average 


No. of clauses 


99.33 


89.67 


97.00 


109.67 


98.92 


No. of sentences 


58.00 


66.67 


72.00 


74.00 


67.67 


No. of passive voice verb forms 


10.00 


9.67 


8.67 


15.33 


10.92 


No. of passive voice verb forms 
per clause 


0.10 


0.11 


0.09 


0.14 


0.11 


No. of passive voice verb forms 
per sentence 


0.17 


0.14 


0.12 


0.20 


0.16 


No. of passive "by" phrases 


0.33 


0.33 


2.33 


1.33 


1.08 


% of clauses with "by" phrases 


0.31% 


0.42% 


2.23% 


1.19% 


1.09% 


% of passive voice verb forms that 
include "by" phrases 


1.85% 


4.17% 


20.24% 


8.75% 


9% 


No. of passive "by" phrases per 
sentence 


0.01 


0.01 


0.03 


0.02 


0.02 
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APPENDIX H 



Topic Averages for Prepositional Phrases 
in Subject Areas 



Table HI 

Topic Averages for Prepositional Phrases in Mathematics 



Statistic 




Topic Averages 






Decimals 


Fractions 


Multiplication 


Ratio 


Area Average 


No. of pps* 


58.33 


68.33 


49.00 


63.00 


59.67 


No. of words in pps 


187.33 


204.67 


159.00 


316.00 


216.75 


Min no. of words in pp 


2.00 


2.00 


2.00 


2 


2 


Max no. of words in pp 


9.33 


6.00 


7.33 


9 


9 


Mean no. of words in pps 


3.19 


2.98 


3.26 


4.89 


3.58 


SD of no. of words in pps 


1.44 


1.01 


1.15 


1.48 


1.27 


Words in pps as % of words 


31.46% 


35.41% 


28.91% 


50.69% 


36.62% 


Mean no. of pps per clause 


0.84 


0.97 


0.81 


0.87 


0.87 


Mean no. of pps per sentence 


1.07 


1.24 


0.91 


1.19 


1.10 


*pps=prepositional phrases 



Table H2 

Topic Averages for Prepositional Phrases in Science 






Topic Averages 








Statistic 


Matter 


Plants 


Storms 


Water Cycle 


Area Average 


No. of pps* 


68.67 


72.33 


52.33 


68.00 


65.33 


Words in pps 


282.33 


287.67 


230.00 


261.67 


265.42 


Min no. of words in pp 


2.00 


2.00 


2.00 


2.00 


2.00 


Max no. of words in pp 


14.67 


11.00 


10.33 


11.67 


11.92 


Mean no. of words in pps 


4.11 


4.00 


4.40 


3.85 


4.09 


SD of no. of words in pps 


2.52 


1.85 


2.07 


3.15 


2.40 


Words in pps as % of words 


41.89% 


44.56% 


40.36% 


49.59% 


44.10% 


Mean no. of pps per clause 


0.94 


1.03 


0.76 


1.13 


0.97 


Mean no. of pps per sentence 


1.41 


1.43 


1.14 


1.72 


1.42 



*pps=prepositional phrases 
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Table H3 

Topic Averages for Prepositional Phrases in Social Studies 



Topic Averages 


Statistic 


Declaration of 
Independence 


Industrial 

Revolution 


Pilgrims 


Slavery 


Area Average 


No. of pps* 


93.00 


97.00 


99.33 


99.67 


97.25 


No. of words in pps 


343.67 


351.67 


374.67 


385.67 


363.92 


Min no. of words in pp 


2 


2 


2 


2 


2.00 


Max no. of words in pp 


12 


11 


15 


20 


20.00 


Mean no. of words in pps 


3.69 


3.63 


3.77 


3.88 


3.74 


SD of no. of words in pps 


1.96 


1.88 


2.09 


2.32 


1.87 


Words in pps as % of words 


41.09% 


39.11% 


41.85% 


39.28% 


40% 


Mean no. of pps per clause 


0.94 


1.08 


1.03 


0.92 


0.99 


Mean no. of pps per sentence 


1.62 


1.46 


1.40 


1.36 


1.46 



*pps=prepositional phrases 
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APPENDIX I 



Topic Averages for Noun Phrases 
in Subject Areas 



Table II 

Topic Averages for Noun Phrases in Mathematics 



Statistic 




Topic Averages 






Decimals 


Fractions 


Multiplication 


Ratio 


Area Average 


No. of nps* 


186.67 


169.67 


157.00 


163.67 


169.25 


No. of words in nps 


399.33 


393.67 


372.67 


447.33 


403.25 


Min no. of words in np 


1.00 


1.00 


1.00 


1.00 


1.00 


Max no. of words in np 


8.33 


9.67 


9.67 


12.00 


9.92 


Mean no. of words in nps 


2.12 


2.33 


2.38 


2.80 


2.41 


SD of no. of words in nps 


1.34 


1.62 


1.47 


2.13 


1.64 


Mean no. of nps per clause 


2.70 


2.41 


2.61 


2.24 


2.49 


Mean no. of nps per sentence 


3.41 


3.10 


2.95 


3.06 


3.13 


*nps=noun phrases 



Table 12 

Topic Averages for Noun Phrases in Science 


Statistic 




Topic Averages 






Matter 


Plants 


Storms 


Water Cycle 


Area Average 


No. of nps* 


141.67 


158.67 


111.67 


132.33 


136.08 


No. of words in nps 


453.33 


433.67 


373.00 


343.00 


400.75 


Min no. of words in np 


1.00 


1.00 


1.00 


1.00 


1.00 


Max no. of words in np 


14.67 


17.33 


13.33 


12.33 


14.42 


Mean no. of words in nps 


3.20 


2.74 


3.41 


2.63 


3.00 


SD of no. of words in nps 


2.84 


2.22 


2.77 


2.31 


2.54 


Mean no. of nps per clause 


1.95 


2.26 


1.62 


2.14 


1.99 


Mean no. of nps per sentence 


2.92 


3.12 


2.39 


3.21 


2.91 



*nps=noun phrases 
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Table 13 

Topic Averages for Noun Phrases in Social Studies 



Topic Averages 


Statistic 


Declaration of 
Independence 


Industrial 

Revolution 


Pilgrims 


Slavery 


Area Average 


No. of nps* 


206.00 


216.33 


210.33 


240.33 


218.25 


No. of words in nps 


544.67 


611.00 


555.67 


636.33 


586.92 


Min no. of words in np 


1.00 


1.00 


1.00 


1.00 


1.00 


Max no. of words in np 


13.33 


14.00 


15.33 


14.67 


14.33 


Mean no. of words in nps 


2.67 


2.82 


2.64 


2.66 


2.70 


SD of no. of words in nps 


2.19 


2.44 


2.20 


2.24 


2.27 


Mean no. of nps per clause 


2.07 


2.41 


2.19 


2.19 


2.22 


Mean no. of nps per sentence 


3.55 


3.25 


2.97 


3.26 


3.26 



*nps=noun phrases 



151 




APPENDIX J 



Topic Averages for Participial Modifiers 
in Subject Areas 



Table J1 

Topic Averages for Participial Modifiers in Mathematics 







Topic Averages 








Statistic 


Decimals 


Fractions 


Multiplication 


Ratio 


Area Average 


No. of words in selection 


596.00 


572.00 


550.00 


618.00 


584.00 


No. of sentences 


56 


55 


54 


53 


54.50 


Frequency of Participial 
Modifiers 


Present 


1.00 


0.67 


0.67 


0 


0.58 


Past 


1.00 


0.33 


1.67 


1.00 


1.00 


All participials 


2.00 


1.00 


2.33 


1.00 


1.58 


Frequency of Participial 
Modifiers per sentence 


Present 


0.02 


0.01 


0.01 


0 


0.01 


Past 


0.02 


0.01 


0.03 


0.02 


0.02 


All participials 


0.04 


0.02 


0.04 


0.02 


0.03 


Participial Modifiers as % of 
total words 


Present 


0 


0 


0 


0 


0 


Past 


0 


0 


0 


0 


0 


All participials 


0 


0 


0 


0 


0 
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Table J2 

Topic Averages for Participial Modifiers in Science 







Topic Averages 








Statistic 


Matter 


Plants 


Storms 


Water Cycle 


Area Average 


No. of words in selection 


2018 


646 


570 


531 


605.08 


No. of sentences 


146 


51 


47 


42 


47.17 


Frequency of Participial 
Modifiers 


Present 


4 


2 


5 


5 


3.25 


Past 


21 


5 


2 


4 


4.42 


All participials 


25 


6 


8 


8 


7.67 


Frequency of Participial 
Modifiers per sentence 


Present 


0.03 


0.04 


0.12 


0.12 


0.07 


Past 


0.14 


0.10 


0.05 


0.08 


0.09 


All participials 


0.17 


0.14 


0.17 


0.19 


0.17 


Participial Modifiers as % of 
total words 


Present 


0 


0 


0.01 


0.01 


0.01 


Past 


0.01 


0.01 


0 


0.01 


0.01 


All participials 


0.01 


0.01 


0.01 


0.02 


0.01 



Table J3 

Topic Averages for Participial Modifiers in Social Studies 












Topic Averages 








Statistic 


Declaration of 
Independence 


Industrial 

Revolution 


Pilgrims 


Slavery 


Area Average 


No. of words in selection 


842.67 


901.33 


705.25 


985.67 


906.50 


no. of sentences 


58.00 


66.67 


59.75 


74.00 


67.67 


Frequency of Participial 
Modifiers 


Present 


2.33 


5.67 


3.00 


5.67 


4.33 


Past 


1.33 


6.67 


5.00 


15.67 


7.42 


All participials 


3.67 


12.33 


7.75 


21.33 


11.75 


Frequency of Participial 
Modifiers per sentence 


Present 


0.04 


0.09 


0.05 


0.07 


0.06 


Past 


0.02 


0.10 


0.09 


0.20 


0.10 


All participials 


0.06 


0.19 


0.14 


0.28 


0.17 


Participial Modifiers as % of 
total words 


Present 


0 


0.01 


0 


0.01 


0 


Past 


0 


0.01 


0.01 


0.02 


0.01 


All participials 


0 


0.01 


0.01 


0.02 


0.01 
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APPENDIX K 



Glossary of Organizational Features: 
Terms, Definitions, and Examples 32 



The glossary is divided into two sections: dominant and supporting features, 
and mathematics task features. The mathematics task features list consists of only 
those features unique to mathematics tasks (e.g., writing a problem statement). 
Terms are listed alphabetically in each section, followed by a definition and at least 
one example. If an outside source was consulted for the definition, the source is cited 
next to it. The following sources (and their abbreviations) were used in developing 
the definitions: Cambridge International Dictionary (CID) (2001); Content Area Reading 
(CAR) (Vacca & Vacca, 1996); and The Random House College Dictionary (RHCD) 
(1988). Additionally, prior CRESST research was consulted and is cited if applicable. 
If no source is cited, the term was defined by the authors for the purposes of this 
report. 

Dominant and Supporting Text Features 

Analogy: 

"A partial similarity between like features of two things, on which a comparison 
may be based: the analogy between a heart and a pump " (RHCD, p. 48). In the example 
below, an analogy is created between the amount of salt water on earth and the 
amount that would be contained in a similarly proportioned 1-liter bottle. 

Example: 

As you saw in the last lesson, most of the water on Earth is salt water. Suppose that 
all of the Earth's water just fills a 1-liter bottle. Of that liter, 972 mL would be salt 
water. Only 28 mL would be fresh water. . . (Frank et al., 2000, p. B26).* 33 



32 The features in this glossary were identified in the current research; therefore, the glossary should 
not be considered an exhaustive list of all possible text features. 

33 Text examples marked with an asterisk were taken from the selections used in this study. The 
remaining examples were taken from other chapters in the same textbooks. 
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Classification: 

To group or to divide things into groups according to their type (CID). In the 
example below, the author classifies different types of energy waves according to 
how they are perceived (e.g., light waves are seen; infrared waves are felt). 

Example: 

Energy from the sun travels in waves, as shown in the illustration below. There are several 
kinds of waves. Each kind carries a different amount of energy. We see some of the waves as 
visible light. We feel infrared waves as heat, and ultraviolet waves tan or bum the skin. The 
sun even produces radio waves, which we hear as radio or TV static. Some of the sun's 
energy, such as X rays, is harmful to life on Earth. But the atmosphere keeps most of the 
harmful energy from reaching Earth's surface (Frank et al., 2000, p. B117). 



Comparison: 

"Pointing out likenesses (comparison) and/or differences (contrast) among facts, 
people, events, concepts, and so on" (CAR, p. 254). In the first sample, comparison 
and contrast dominate the paragraph, which discusses the differential impact of 
having or not having atmosphere. Sometimes comparisons are implied, in that 
statements showing differences or similarities may be made without the explicit use 
of comparative grammar, as in example two below. 

Example 1: 

Without the greenhouse effect. Earth would be a much colder place-too cold to 
support most forms of life. Earth would be more like the Moon, which has no 
atmosphere. Without an atmosphere, there is no greenhouse effect. So the Moon's 
surface gets much colder than any place on Earth, as low as -173°C (-279°F). The 
atmosphere keeps Earth's average surface temperature at about 14°C (57°F) (Badders 
et al., 2000, p. E15). 

Example 2: 

A toy store finds that about 65% of its customers are less than 18 years old. At an 
electronics store next door, about 5/8 of the customers are under 18.... (Greenes et 
al., 2002, p. 541).* 



Contradiction: 

To show, state, or illustrate a direct opposition between things or an inconsistency 
(RHCD). In the example below, the author points out the contradiction between 
Thomas Jefferson's criticisms of slavery and the fact that he owned slaves. 

Example: 

Jefferson owned several slaves in his lifetime and lived in a slave-owning colony. Yet 
he often spoke out against slavery. "Nothing is more certainly written in the book of 
fate than that these people are to be free," he wrote (Banks et al., 2001, p. 314).* 
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Definition: 



To say what the meaning of something, especially a word, is (CID). Some definitions 
are sentence level and some are extended definitions that may consist of single or 
multiple paragraphs. In the example below, a definition is provided for a key word 
in the paragraph. 



Example: 

An empire is a conquered land of many people and places governed by just one 
ruler. (Boehm et al., 2002, p. 101). 



Description: 

"Providing information about a topic, concept, event, object, person, idea, and so on 
(facts, characteristics, traits, features), usually qualifying the listing by criteria such 
as size or importance" (CAR, p. 254). The first example is a description of different 
types of clouds and includes an example of an embedded supporting feature- 
explanation-w hich tells why some clouds look darker than others. The second 
example provides factual information (e.g., description of what was sold) needed to 
solve a word problem. 

Example 1: 

Clouds look different depending on what they are made of. Water-droplet clouds tend to 
have sharp, well-defined edges. If the cloud is very thick, it may look gray, or even black. 
That's because sunlight is unable to pass through. Ice-crystal clouds tend to have fuzzy, less 
distinct edges. They also look whiter (Moyer et al., 2000, p. 122). 

Example 2: 

Orange juice and lemonade were sold during intermission. Three fifths of the sales 
were orange juice and 35% were lemonade. Which drink was less popular? (Greenes 
et al., 2002, p. 539). 



Enumeration: 

To name things separately, one by one (CID); often used for the purpose of 
providing examples or grouping items. Enumeration can occur within or across 
sentences and may involve a "listing" of two or more items. In example one, a list of 
materials used to decorate floats is provided. This is an example of enumeration 
embedded in a description, the dominant feature of the word problem. Example two 
shows an instance where only two items are enumerated. 
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Example 1: 

The 111 th annual Rose Parade in Pasadena, California, had 56 floats decorated 
entirely with flowers, seeds, vegetables, and even seaweed. After the parade the 
floats were lined up for exhibition. If the space needed for each float was 185 feet, 
how far did people walk to see all the floats? (Greenes et al., 2002, p. 114). 

Example 2: 

Chet had $20. He paid $8.95 for admission and $5.25 for lunch. Will he be able to buy 
a hat for $10? (Maletsky et al., 2001, p. 58).* 



Exemplification: 

To provide an example(s) for something that has been defined or discussed in a 
sentence or passage. Below, the author provides two examples of energy that are 
derived from fuel sources. 

Example: 

The sun is the source of most energy on Earth, but where does the sun's energy come 
from? On Earth, energy often comes from fuel. For example, burning gas or coal 
produces energy. But the sun's energy doesn't come from burning fuels. It comes 
from the fusing, or combining, of small particles to form larger ones (Frank et al., 

2000, p. B116). 



Explanation: 

"Showing how facts, events, or concepts (effects) happen or come into being because 
of other facts, events, or concepts (causes)... Showing the development of a problem 
and solution(s) to the problem" (Vacca & Vacca, 1996, pp. 254-255). In the example 
below, the author explains how clouds form. 

Example: 

What has to happen for a cloud to form? The Explore Activity was a model of how 
clouds form. Clouds are made up of tiny water droplets or ice crystals. The air is 
filled with water vapor. When the air is cooled, the water vapor condenses. That is, 
the water molecules clump together around dust and other particles in the air. They 
form droplets of water (Moyer et al., 2000, p. 122). 



Labeling: 

To produce a term corresponding to a given definition (Butler et al., 2004). In the 
example below, the term "emperor" is produced after the definition is given. 
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Example: 

An empire is a conquered land of many people and places governed by just one 
ruler. That ruler is called an emperor (Boehm et al., 2002, p. 101). 



Paraphrase: 

A strategy used to ensure the reader's understanding of a key phrase or word, 
whereby the key phrase or word is rephrased, often in a simpler way. In the 
example, the author provides a simpler word for "fusing." 



Example: 

It comes from the fusing, or combining, of small particles to form larger ones (Frank 
et al., 2000, p. B116). 



Providing instruction or guidance: 

Some texts refer to other lessons, provide suggestions, or give other types of 
guidance. In the first example, the text reminds students of a prior lesson. In the 
second example, students are given the unit of measure to use to solve the problem. 

Example 1: 

When water evaporates, remember, it leaves behind the material it contained (Moyer 
et al., 2001, p. 362).* 

Example 2: 

For the past 15 years, Tanya has jogged 10 miles per day. How many miles has she 
jogged altogether? Use 365 days per year (Maletsky et al., 2001, p. 158).* 



Questions: 

Questions are embedded in the texts for multiple purposes, including to stimulate 
critical thinking, to introduce new topics, to contextualize a topic in an everyday 
setting (see example one), and to review or call attention to prior lessons. In example 
two, the question acts as the topic sentence, introducing the reader to the content 
about to be discussed. 

Example 1: 

It's a cold day, and you and a friend are standing at a bus stop. You've been 
shopping, and you each have a package to hold. One package is quite heavy; the 
other is lighter but is larger and more bulky. Which package would you choose to 
hold? (Badders et al., 2000, p. CIO).* 
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Example 2: 

What did the Declaration say that made it so powerful? The purpose of the document 
was to explain to the world why the colonies had to separate from Great Britain.... 
(Banks et al., 2001, p. 316 ).* 



Quotation: 

Citations from primary sources may be used for multiple purposes, such as to 
explain a concept or to describe an event or place. In the first example, the quotation 
is a part of the content being taught. In the second example, the quote is used for the 
purpose of exemplification. 

Example 1: 

When Jefferson wrote, "We hold these truths to be self-evident," he meant that there 
are truths that should be clear to everyone (Banks et al., 2001, p. 316).* 

Example 2: 

Christianity inspired the slaves to compose spirituals, religious songs expressing 
strong desire for a better life. One included these lines: "There's a better day a- 
coming. Go sound the jubilee." The tradition of spirituals was one way black 
communities were strengthened (Armento et al., 1999, p. 410).* 



Reference to text or visual: 

Texts often refer students to different parts of a text in order to provide support for a 
lesson or concept. In the first example, the text refers students to a diagram that 
illustrates a written description. The second example refers students to an activity. 

Example 1: 

. . .The holds below the deck of the Mayflower were stuffed with barrels of salted beef 
and bread. The holds also contained pigs, chickens, and goats. The diagram above 
shows that the space below was very cramped for the more than 100 passengers on 
board. ... (Banks et al., 2001, p. 187).* 

Example 2: 

...The activity that is on page E65 uses water in a bottle to model a tornado. The 
spinning water is shaped like a tornado. But unlike the water, air in a tornado spins 
upward (Badders et al., 2000, p. E72).* 



Scenario: 

To provide a scene for the purpose of exemplifying, aiding in conceptual 
understanding, or providing a context for problem solving. In the first example, a 



159 




scenario is provided to introduce the topic of the lesson (e.g., matter). The second 
example is a classic mathematics word problem scenario. 

Example 1: 

It's a cold day, and you and a friend are standing at a bus stop. You've been 
shopping, and you each have a package to hold. One package is quite heavy; the 
other is lighter but is larger and more bulky. Which package would you choose to 
hold? (Badders et al., 2000, p. CIO).* 

Example 2: 

Luisa and her brother are going to plant a garden that is 8 meters long and 6 meters 
wide. They each want to take care of a separate area. If they share equally, how much 
area will her brother get? (Willoughby, Bereiter, Hilton & Rubinstein, 2003, p. 195). * 



Sequencing: 

"Putting facts, events, or concepts into a sequence. The author traces the 
development of a topic or gives the steps in the process. Time reference may be 
explicit or implicit, but a sequence is evident in the pattern" (CAR, p. 254). The first 
example is excerpted from a passage about migration in America. The second 
example is an excerpt from a passage about matter describing the steps needed to 
measure something. 

Example 1: 

In 1836 two missionary couples set out on the Oregon Trail. Marcus and Narcissa 
Whitman and Henry and Eliza Spalding hoped to teach the Cayuse people in the 
Oregon Territory about Christianity. After traveling about six months, the Whitmans 
set up their mission near the Columbia River. The mission also served as a resting 
place for travelers .... 

Early in 1847 another group of religious followers headed west. More than. . . . 

Finally, in July 1847, the first group of Mormons reached a large lake now known as 
the Great Salt Lake... By 1850 there were about 5,000 Mormons living in the town 
(Banks et al., 2001, p. 429). 

Example 2: 

Suppose you want to know how much water or some other liquid is in a container of 
some kind. First you pour the liquid from the container into a graduate. Then you 
measure the level of the liquid against the scale marked on the side of the graduate 
(Badders et al., 2000, p. 02).* 



Simile: 
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"A figure of speech in which two unlike things are explicitly compared, as in 'she is 
like a rose'" (RHCD, p. 1226). In the example below, the concept of a how a tornado 
gains its speed is compared to a figure skater. 

Example: 

Like a spinning skater who pulls her arms in close to her sides, the spinning tornado 
gets faster and faster (Moyer et al., 2000, p. 164).* 



Summary: 

To express the most important facts or ideas about something or someone in a 
concise form (CID). The example below summarizes the critical facts students must 
understand in a lesson about plant reproduction. 

Example: 

Most flowers have both male and female reproductive parts. Pollen, which has sperm 
cells, is produced by stamens. The pistil has the eggs. Pollen is transferred from the 
stamens to the pistil. After fertilization, eggs develop into seeds. Many flowers attract 
animals that carry pollen to the pistil. In some plants, pollination depends on the 
wind. (Frank et al., 2000, p. A119).* 

Mathematics Task Features 34 

Justification: 

To provide evidence or give a reason or explanation for something based on 
experience, knowledge, or facts (adapted from CID). In the example below, students 
are asked to provide examples that support their answers. 

Example: 

How many equivalent fractions can be written for any given fraction? Give examples 
to support your thinking (Greenes et al., 2002, p. 319).* 

Write problem or question: 

Students are asked to write a problem or to generate a problem question when given 
certain specifications or guidelines. The first example requires students to generate a 
word problem and the second only asks for the problem question. 

Example 1: 

Write a problem that has too little information to be solved. Then write one that 
includes information that is not needed to solve the problem (Maletsky et al., 2001, p. 

551).* 



34 In this subsection, we list features in the task or problem question of mathematics word problems 
that require language production and that only occur in mathematics selections. 
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Example 2: 

What's the question? Eileen bought 2 gallons of milk at $3.02 per gallon and one loaf 
of bread at $.99. The answer is $7.03 (Maletsky et al., 2001, p. 173).* 
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APPENDIX L 



Synthesis of Grammatical Features of Language 
Functions in Textbooks and Printed Materials 1 



References Table 11 from Butler et al. (2004). 
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Table LI 



Synthesis of Grammatical Features of Language Functions in Textbooks and Printed Materials 



Grade cluster 


3-5 


3-5 


3-5 


6-8 


Multiple 


6-8 


9-12 


Content area 


Math, Sc 


Sc 


SS 


SS 


Math 


Hist 


Geo, Hist, Sc 


Research study 


Butler et Bailey et 


Butler et 


Butler et al. 


Dale & 


Short 


Coelho 


al. (2004) al. (2004) al. (1999) 


(1999) 


Cuevas (1982) 


(1993) 


(1982) 


Functions 


Comparison/Contrast 


Adverbial comparatives 


• 


• 


• 




• 




• 


Comparative adj forms 


• 


• 






• 




• 


Equative comparative 
forms 










• 






Imperative verb forms 


• 














Logical connectors: 


• (c) 


•(b) 




• (a,b,c,d) 






• (a,b,d) 


a) Conflict or contrast 

b) Exemplification 

c) Replacement 

d) Similarity 
















Description 


Imperative verb forms 


• 


• 












Logical connectors: 


• (a) 


•(b) 




• (a,b) 


• (b,c) 




• (b/d) 


a) Effect /result 

b) Exemplification 

c) Sequential 

d) Similarity 
















Modals 






• 


• 








Nominal structures 


• 








• 




• 


Passive voice 


• 








• 




• 


Phrasal verbs 




• 




• 








Predicate adj structures 
Prepositions 


• 


• 






• 




• 


Simple past 
Simple present 


• 


• 


• 


• 








Subordinate clauses (e.g., 
relative clauses) 




• 




• 


• 






Temporal phrases 






• 






• 




Explanation 


Imperative verb forms 
Logical connectors: 


• 




• (a,b,c) 


• (a,b) 


• (a,b,c) 


• (a) 


• (a,b,c) 



a) Cause/ reason 

b) Condition 

c) Effect/result 

Modals 
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Verb cause w/ infinitive • 

Note. Sc = Science; SS = Social Science; Hist = History; Geo = Geography; Adj = Adjective. 
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APPENDIX M 



General Procedures For Text Selection: Stage 1 



Materials needed: 

1. General Guidelines for Stage 1 

2. Subject Area Guidelines with sample texts 

3. Text Selection Checklist Form 

4. Grade Level Standards & Indicators [CA standards] 



Steps: 

1. Review the General Guidelines for Stage 1, the Subject Area Guidelines for the 
subject area you will be selecting texts from, and the Text Selection Checklist 
Form. 

2. Review the first content standard and the performance indicators for that 
standard. 



3. Select one performance indicator for the standard. 

4. Using the Subject Area Guidelines and the selected indicator as guides, 
identify one text. 

5. After identifying the text selection, complete the Text Selection Checklist 
Form to verify that it fits the criteria for selection. 

6. If the text is judged appropriate, complete the following steps: 

(a) Photocopy the text selection; 

(b) Write the name of the textbook and author on the top right hand 
corner of the photocopy and be sure the page numbers are clear; 

(c) Write the content standard number and performance indicator 
number on the left hand corner of the photocopy. 

7. Attach the completed checklist to the front of the text selection and 
continue selecting texts using the same procedures above. 
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